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INTRODUCTION 


As doctor I know that many of my n mli. 'I -mi_l oaumed cal cude ignn. 
:ogmse *n themselves a statistic.ti ilt ’.i im. My aim, in writing this 
ok. IS to Conti ibutt toward:- cot rectum ■ f • I.. f I hope r. lull *. i fit i 11 v»* .mil 
hitable The examples have a rneiiu .il t i e. but h ive Ikm n u*«hI . ml 
iderstiiod by non medical peopy 

I h-tvr- cove»ed the ten* ■ which .-op ate I t • m. to be most useful to <• 

I leagues These' j»e the de-.i :ription >irvd preen tat ion : > I dii ile.i. .iim 
lability and significance tests and the conclusions whe n may oe drawn 

«* book is ir* 1 ei <i.,d I ;-*r thov wh • w.itr to moke i ■ t. rt iltmq .t ilt • v l 

tbs This being s;> I m.ike no apologies to the purist* tor uvi" simolifu .ition 
the numerical facts of life. 

This is mainly a programmed learning text The rnformation is dispensed in 
all doses col led ‘’frames' with questions and answers to lost conti mously 
i* reader :• iji.smi :if rhi* '•.jt:|ei I ”u'tn In oidei to derive m ixr- m f*'r:eh» 

e cjij-. 1 i ad.istc tu attempt . j j'ij»ve» the question: in -Ju-ch h e Lefvit 

ikmq .it the qivnn response F or this purpose a cover tnr rhe answer oltimn 
jrovided 

This hook has hivn drrk’Hoped over Inor years and its nrreitl form <• its 
a amended version Cut it isms and .advice have been ms.uiviaI . <d hut min 
:ount at eveiy stage and I would I ke to thank all those wt>a read the *est 
conscientiously during its development To a large extent these people 
ve writ! re*. this hook tea you I thank them, at the time exnrevating 

r»t from blame 'hey include med i c •.huJeiiu mil ■ n ejq.je. .-r Pie Hay.il 
t. Hospital Medic a School aid it 'he Uri.versities of Leeds, Pie:, aia ji • 

? Wit watery and In the University o* Rhodesia, besides const "ferable assist 
;:«• Rom within the Faculty of Medicine groups of .min ;jr .*!■ m- >1 
uluattb in the Faculties o* Science ar»d Emi.-itiiin have stalled and .y 
•nted uoon the parly vers a .s 

I am partiailaily g-atef I for live considerable measure r.r :*ssisTarve 
ivided initially by Dr David Hfiv/krldge, now it The Open University, and 
jsfcquently by Mi De-nvll Fa. 1 1 • . i • • t's .> La.' n the .•'!••• •••• "I 

I ..It Educetron at This University Within the Mi.rrl.:.il f a.-ailty I owe • u-»-.ii 
af to the fodx'orarvco of the a-p,itmeni ' secn '.itie , u pent hit h ■- . 

rse-nary Horn and to the errouragemen! o* mv co lean • D '. 

ttey I am indebted to the Head of my Dcpir -- ra, Pruh* .m VV Fr.i-.i-. 
iss, fnr supplying the original Irn-u co isi let aide .uppot r and .a u-. si . -. t 
. stationery vute thruol/>OJt this veutuf a 


1971 


W M CASTLE 


v 
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Chapter 1 


TYPES OF RESULTS 


INTRODUCTION 


This chapter teaches you to distinguish between the two typm nl numerical 
data, as this distinction is used over and over aqain during the book Rates, 
ratios, proportions and percentages arc described briefly 


Thr height of a patient is mea-vv/red 
The number of patients attending* 
particular conic is counted. 

Is weight measured or counted 7 Measured 

What about time of survival? Measuriii 

Are people who have been vaccinated 
usually measured or counted? Counted 


Results which are measured are called 
continuous or quantitative E ach 
individual has one measurement from a 
continuous spectrum, e g 4'11 5 7 

6 2" Results which count people into 
groups with certain attributes are called 
discrete or qualitative 

Sex is «'ttl>»W|IMI*Ntt|IM MiMOimtllfllil dars. Qualitative. 


Sorry to have to describe sex as Quantitative (because it is 

‘qualitative data' but that* life* measured). 

What k*nd of result is heart size 7 


What kind of result is red cell volume 7 Quantitative 


Why is it quantitative? 


Because it n. misr-iired. 


Blood yruups are.-.data. Qualitative patients are 

counted min p.ir'icului groups 


Writedown 3 different sources of qu«n; I Q , Ex.ifniruitiun Results 

itativs measurement, e.g nladder capacity and Bank Balance are 3 
0) 121. powihilitii-. 

O).... ... 
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1.8 Sometimes examination results are Candiitotc* are grouped on a 

listed qualitativrtv rather than pass/'t ail basts, only 

quantitatively. How is this done? 


1.9 Country of origin of doctors practicing in 
Country X 


Qwnl/y 

AAiAm 

Females 

Tohtt 

h malted 5 
Wole* 

T42 

28 

170 

lrHor>d 

29 

2 

31 

Hdv 

5 

3 

8 

Srntu'Hl 

74 

IS 

80 

S Africa 

15? 

3* 

183 

USA. 

5 

1 

6 

Others 

7 

10 

17 

Total 

419 

05 

604 


What type of data is this? 


Qualitative. 


1.10 How many doctors originated in 

E ngland and Wales? 17 g 


Qualitative results are often given as a Doctors in X qualified 

ratio, a proportion or a percentage outside England and Wales. 

Anv qroup we single out tor mention con 
be said to have the characteristic 
mentioned’. 

tf we cull th»t group the sheep, the 

others are the goats 

W»vdt are the goats m the lost flame? 


A ral>o may lx? defined as 

t he numbe r of sheep 
the number of goats 

In Frame 1.9 wh.it .s the ratio ol 

those who originated m England Wales? 
those who did not 


170 

334 


334 is the number 
originating outside 
England and Wales. 
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113 What is the ratio of woman to men doctors 

in Country X ? 

85 

419 

1 14 the number of sheep . 

If - is a ratio, 

the number of goats 


the number ol sfieep 

and-— 

the number of sheep and goals 

is a proportion. 


the rod cell volume 

- is a. 

the total blood volume 

Profsortion 

1 15 The ratio uf women doctors to men in 

thrs oountry is 06 

419 

85 85 

What is the proportion? 

419 + 85 504 

116 100x183 

504 


is the percentage |%l of aU doctors >n 
Country X who o/igiruilcd in South Africa 
The percentage is defined as 100 times the 

Proportion 

117 Is the percentage Therefore 


100 x sheep 
goats 

No. % - 

100 x Aeep 

sheep and goats 

1 10 Give names to the following indices from 

Frame 1 9 (They all refer to the U.S.A.I 

The la! proportion 
(h> ratio and let 

«*' 05 84 if 

percentage of women 
doctors who qualified in 
the IJS A 
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1 19 If is pe*l<ctty m tier to cancel the 

numerator ami denominator where ^ 

possible. 1 IT% 0 f 1 ?% 

! r0 This .ramwlling ir. otian 

-V H. can be written » | >|M ( «H '”**** * moused 

85 


1?0 Ratios, proportions and thicentaqes may 
set-n trmoccni They ire some*tntes 
misused in |ournal> An ear, nose and 
thrpat surgeon proudly reports that he*hits 
100% Sr van survival rate tor patients 
with cancer ot the liuoal .ittcr 
operation 

An* you imrntvsad? 


Not until you know how 
many patients .ire involved 
Ho may have onty operated 

once' 

Unless the actual number 
involved w ilso quoted 
mistrust ratios, proportions 
and percentages 


1.21 A normal blood picture Should contain 
about 5 million red ceils and 6 thousand 

white i.e. a ratio 1000 cjtn't say He may have 

1 5 million white cells and 

Mr Van iter Mtmv hJs bk cells lixikaemia'. A ratio on its 

m tins ratio Is his blood picture O.K.’ myn is not enough* 


1 22 B-esidiis ratios, proportions and 

percentages, doctors quote rates' 

Pur example, the Still hitrh Rate is 

the numher ot still-births 
me total number ol births 

Is thk a percentaijn .» proport tun or a Noih- It is 10 times the 

ratio? percentage. 


1.23 Some ot the v> called rates’ in Medicine 
are nut true rates, a rate bemy defined a s 

the number counted overcarta-n period 
the total ot a given tim* 

The Crutln Death Kate recalculated Irom 

the number ot deaths m l y*wr 
thr estimated population on July 1 m 

Vks ITnc yweiI definition is 
k this a prnper ruin' usually muttipinjd by 1000 .) 
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1.24 The number of deaths hy accktaot in tQfe ti 
Thr number of survivors In 1066 

«s a rale/nKio/proportton/ Ratio 


125 Sheep 
Total 


Sheep 

Goats 


is a 


•*«»!•• t«»« »«•< 


Sheep counted uver a period 
Total at an instant 

15 ® .. 


PropOrlion 


Rctiu. 


Rate 


1 26 Ratios, tales, proportions and percentages 
are all means of expressing what kind of 
data? Qualitative. 


1.27 If you are presented with any ol these 
values you should al.vo he tolrl what 5 


The number involved m the 
survey or trial 


1 28 Occasionally in medical journals the term 
ratio, proportion and rate are confused 
One may read that the ratio of A to B is 
11/H. instead of one to.. 


Nine 


A x 100 

A • B 

10's, 

A 

1 

A»8 

10 

/, 10A 

A B 

9A 

- B 

A 

1 


~ a o 


1 29 


The prupurtion of A cs defined ds - 


Total, say A*B 
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1 30 If tlw proportion of A >tny eases ami this * 

JL 

AB 

Hie txopurtion nf B mat inciww. decrease Deneuve 
.it thu tame limit 


1J1 I read in a journal 

It is particularly inlet «'•< my to see that 
a* the proport*ort cl uve egri* incrttftcs 
Ihi* proportion of dead e*ws iI»m n-.i-.t-i 
Commnnl 


11 is inevitable rafhnt thjn 
interesting Liked sdl.riy 
cheque, as the proportion 

. Mi:»♦•rives m opo 

money must 
Keep you» eyes 
open and you wi I notice 
this type uf mistake in 
the journals 



1 32 Before learn try jbouf how to prituent 
qualitativr revilh in jouniuli {and how 
not to!) in thr mxi rhupici. check that 
vou havi! learned All the main points in 
this chapter m the Revision Summary 
below 


SUMMARY 


Results which are counted inn groups ,ir* rallied qualitative. 

Qu.irititat w values are measured 1>e difference is important from the 

Mittntic.il point of viowv 

Qualitativedsu n txi vimmorisnrl as 

shflflp , sheep . 

r-' 'O' .<rid proporfKins-~ .vhere the percentage {<i») 




is 


100 times the u'opuihon 

A rats li urn lv to piopotbon but its ttrniiminator is .i statin mewurwm 
whrri'pt the numerator iscrn nted ovei a put nxl of time. 

I f »s wrung (<: state o-n» el • ese indices without quoting the number mvol 

I I h wrong to ini- - much notice of them In Hie absence ol this intoimahor 

Keep a look oul for thee misuse 
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Chapter 2 


ILLUSTRATING COUNTS 


INTRODUCTION 

Sometimes rosflarch workers spend a lot of their time obtaining reso Is and 
then present them poorly in journals Illustrating data is a useful tujJ'C as 
hadly presented results can mislead you Diagrams should a*fl the leader by 
saving him time and by highlighting the points. Qualitative and Quantitative 
data are presented differently. 

7 I There are 4 ways of presenting qualitative 


data 

la) P»e diagram 

Ibl Ptctuymm 

fcl Barchart 

Id) Proportional bar chart 

These 4 methods are used below to 
illustrate the fact that 

45V. of Europeans have blood group 0 
41% have blood qroup A 
10%. have blood group B 
4% have blood group AS 

Can you label the methods? (You will 
need to rely on an inspired guess I. 



a 


b 


la) is a 


(b) n a 


COfttU ovtrrfwif 
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c 


d 


(O IS a ... Itll If* ^ 


?2 Why is (a) called a pie Oiaqram ? 

2 3 Why is (c) Called a bar chatf? 


2.4 Why is <b) called a proportional bar 
chart 7 

2 5 (dl isa ptctoqram The gram Infers 

meam rumen; and p*ctd a picture What 
picture would you use to represent the 
composition of cow s milk by volume 7 


2.6 What methud o* r opt event at *oi is this ? 



(at Pre diagram 
lb) Proportional 

bar rhi*rt 
(c) Bar chart 
Id) Pictoqiam 

II your a -awecs .ve correct 

qo straight to Frame 2.5. 


It -s pie shaped. 


It is shaped like a series 
ol Ixars. 


It IS a bar chart with sub 
divisions pmpcrtwined ofl 


I asked you wh»i you would 
uw ynu an* entitled to 
choose. A milk bottle or urn* 
are possibilities Some 
Picasso amonqs* you may 
choose to draw a cow. 


Bar chart • horizontal 
this time 
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2.7 


Frame 2 6 represents some results Is there There is no way of knowing 

any way of knowing what they .ire about’ urlnss you .ire psychic 


8 One principle in illustrating results is to 
give ihe diagram a. 


Heading or title 


The diaqram in Frame 2.6 departs the 
composition of ooms milk by weight 

88/i is water 
3% is protein 
4% is fat 

5% is carbohydrate 

Besides labelling the heading what else 

should be labelled'’ The various sections 


10 The diagrams below give 2 further labels 
besides the heading and components 

What are they? The author and the date 



A/*** in nhiefi doctom in Country X jre 
pnctiSng FtnviMr culicclcd Gy Prolnioi 
W P. R<wi. '967 


11 One of the principles of presenting data is 
to label fully. What dux. kl Ire labelled? 

0) (2| (3) |4| Heading, components, 

author, date. 


12 To recap, what are the 4 mam ways til 
presenting diagramnut.cally daia which 
« qualitative’ 


Pic diagram, pictogram, 
liar chan and proportional 
bar chart 
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B»- 0 des bpir x*fu/ty li/viM. or sell 
explanatory. diagrams must I»p easy lo 
jnd*n.t«nd Thiry should bo in simple as 
possible. and not overcrowded with 
information Look *1 the 


Huw would you improve it? 




Female Retired 

Male ^ Practicing 


Arem .owihirr rtncusrs m Coo’iTy X .ve 
pMCliimg lor -m> im*>«cI shew r.p aa 
Ojt#cgUw: 1 in.; bv Ibnlwsoi W F Ross. I *>67 
tout not pnanntad t-ss Hu»f) 


Pio diagram. 

There too much on one 
diagram Split it into two. 




7 1T Simplicity is Use second principle in 

prmnfrM) results A good diagram will 

save a lot ol woros in me text The 

diagram n Frame ? 10 it s not a simple Is 

diagram r>si_ius« n would would not sav*: Would 

a lot of words in the text. 


2 15 Wb.it may be descrined as the first ? 
principles In di;jnctiny tkita y 

2 1b there na third m.gji>\tv I don't mean 
just telling the truth but also the whole 
truth and nothing but the truth. 

There mus: lie no atlrmp* in mislead and 
Histn «i must Ire to he done. 

A manufacturer of a hair tampon, 'Coo. 
yavi *ip» ssmpin IQ 10 filrmtuis I lost 
the sample and the other l J uvsl theirs. 
The matuifar.tuoira claim 0 out of 10 
fil/mtan me 'Coo ithw/poo 
Is he right ? 


Full laWling. 
Simplicity 


You say ye*. 

Thu slogan is dishonest and 
misleading although d 
contains the truth I suppose 

You say no. 

You are rot gullible 
you Bfit probably a very 
good cheat 
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2 17 How can you be dishonest with By failing to record the 

proportions and ratios, etc > total number considered 

2.18 The same principles of presentation app J y Full labelling and simplicity 

to quantitative data and moreover it is 
easier to be dishonest with lh.it typo of 
data. Apart from honesty, what other 

principles arc involved >n ill'(striding data* Full labelling and simplicity 


2 19 One of the reasons for diagrammatic 

presentation of data is to make the points 

clearer Wind is the othter’ To save words. 

2 20 What methods oo you know for Pie diagram, 

illustrating counts’ Bar chart 

Proportional hjr clurt 
Pictogram 


2 21 Why would you use these methods’ 


To save words and make 
the points clear 


2.22 To what principles would you adhere’ Full labelling, simplicity 

and honesty 


2.23 Practical Example 


Use a proportional hnr rhart to present 
the data in Frame 1 9 


SUMMARY 

I hope your example is fully labelled and is as simple as poss;ble. Honesty is 
the other criterion but th«s will be followed up further in the next 

chapter 

To present quahtat ve data d aqiammatically, pie diagrams, pictogrum*. bar 
charts and proportional bar charts are used. They are intended to save words 
and make points dearer 
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Chapter 3 ILLUSTRATING MEASUREMENTS 


INTRODUCTION 

Quantitative results arc generally more cumbersome than qualitative to 
represent duiqi.immatir.illy. it is easier to represent sheep ami goats than to 
il’slingui'J'. ilio»|rarnnviticitlly a 200 lb. sheep (turn a 240 lb. sheep. 


3 1 Birth it inghts of 12 halve* of mother* 

found u i tw suffering from Uigur cUabeiex 

<Fictitious data) 


103 or 

131 or 

143 or 

Plot 

138 or 

146 0 / 

1 14 or 

139 or 

161 ur 

172 or. 

138 o i 

1 70 02 

What < 

iod of date .s this* 



The characteristic which is varying is 
called not turpi'singly. the enable 
Wnt ;s the variable in the last frame? 


Quantitative 


The tmth weight of babies 

of diabetic mothers. 


Such results as ip Frame 3 1, from large 
-•amoves. <»r« grouped before llustratior.. 
->efe below. Is there mote information in 
the data.jlirr grouping! 


Data from Frame 3.1 Grouped 


Group file 

Group 

freifoeruy 

i an 

9Q oi 

Q 

7 100 

119o/ 

3 

3 120 

— 139 o/ 

5 

4 140 

153 a; 


5 160 

- 179 or 


6 ISO 

199 or 



No For example no 
distinction is made now 
between the 122 or anti 
138 o/ birth weights 
weights. 
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3 4 The price to pay for being able to 

illustrate measurements is loss ol some 
information. The number n each gr oup 
is called the frequency ol that group 
What is the frequency in the ytuup 
120-139 oz. in the last frame? 


3 5 All the frequencies considered together 
form the frequency distribution Th« 
frequency distribution in Frame 3.3 is 
0, 3, 5, 3. 1,0. What is the frequency 
distribution below? 


Birth vrvtyhts of 16 babies of norm#/ 
mothers. |Fictitious data'. 


52 oz, 

103 or 

109 oz 

127 or 

79 or 

104 or 

111 or 

149 or 

80 oz 

104 oz 

120 oz 

150 OZ 

100 oz 

106 oz 

121 07 

162 oz 


The above data Grouped 


Grouo No. 


Group 

Frequency 

1 

20 

— 39 or. 

0 

2 

4U 

5 ® 07 

1 

3 

60 

- 79 oz, 

1 

4 

80 

99 or 

1 

5 

too 

119 oz 

7 

6 

120 

- 139 or 

3 

7 

140 

159 oz 

2 

6 

160 

— 179 or 

1 

9 

180 

199 or 

0 


3 6 What is the variable in the last frame? 


3.7 Why is haemoglobin level a variable? 


5 


0 1, 1. 1 7. 3. 2. I, 0 


Birth weiqht of babies of 
normal mothers. 


Bor.juui it v.nrs 
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3 8 Having grouped the dut-i wf can preseni 
K more* easily in a diagram. Below the 
results from Frjmr 3.3 are paMly tiller! 
in Complete the diagram. 


a. 



60 120 160 200 

Wt inO? 



a 



3 9 This method ot illustration is called 

a hrsrogram Instead of'No of babies in 
each group we could write 

Frequency 


3.10 In your i>wn words descrioe a histogram 
tirni what It is used for 


A history am is a method of 
presenting quantitative data. 
Along the horizontal axis is 
the- variable and up the vertical 
axis is the frequency. The 
histogram r> j set ies of boxes 
st and i nq sde by side The 
size of the box indicates the 
frequency in the group 


3 11 Drav/ a histogram to llustratc the data 
|rt Frame 3 5 



40 100 200 

Birth Wt. 
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3.12 


3 13 


3.14 


Another method of presenting measured 
ri.itfi is ihe frequency polygon. If the mid 
points uf the bux lids are joined by a 
series of straight lines «n the histogram 
you have just drawn, you have the 
equivalent frequency polygon. 

Complete:- 



20 60 100 U0 180 220 
Birth Wt.in Oz. 


The boxes are not shown on a frequency 
polygon only the series of straight lines. 
If you wanted to show two frequency 
distributions R/ch as Frames 3 3 and 
3 5 o<"> the same digram, would you use 
2 histograms or 2 frequency polygons? 


The frequency polygon ■$ always 
continued until it meets the horizontal 
axis. Put another way. the frequency ol 
the outsrdc groups included in a frequency 
polygon is always. 



40 80 120160 
Birth Weight 


2 Irequency polygons, 
otherwise the boxi*«. wtsuKl 
overlap 


Zero 
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3 15 




30 soasu: children had rho following 
I.Q s 


ia 

T mourrH'v 

601« 

3 

70 >9 

6 

30 ES 

11 

90 99 

7 

too 109 

4 


Construct twlmv a frequency polygon to 
reonru’ft this frert,.rney distribution 



3.16 What .ire the 3 principles for good 
presentation of all results 7 


3 17 What is the communes^ dishonest method 
used with qualitative data 7 


3 18 With quantitative data, cheating is very 

easy. I will teach vou 3 tricks. 1 such trick 
is to suppress the zero. 

• g. Weight loss on Silty tablets 



I.Q 


Full labelling 
simplicity, honesty, 


Nut stati’rg how many 
results were included in 
quoting percentages etc 
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3 18 coord from nppo.ii re 


3 19 



Days on Treatment 

Re-sketch the diagram without suppressing 
the zero (i.e. mar< 0 on both axes>. 


250-1 

200 - 

n 

c150- 

gi^OO' 

I 50- 


—i-1-1-1-r - 

0 1 2 3 4 5 

Days on Treatment 


Although zero must always be shown on » 
diagram, sometimes the axis can be 
condensed as follow:- 


210 -. 

200 - 

190- 

IS(h 


This means some 
values have been 
^-omitted 


Mot so imprpsMvtt is it’ 



cars td owfeitf 
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3ty 


canttf. 

Clowilv rvbied w bupprrssmg the zero is 
the trick of inflating nt exaggerating the 
ttite. eg 



convicted drivers in mg 
per 100ml of blood 





contd on apposite page 
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319 contd 

On a mors reasonable stole this becomes 



Blood alcohol level 

What is a reasonable scale? 


A reasonable scale t* one 
which neither over 
pmph,vj‘.m nor under 
emphasises the evidence. 


3 20 What is wrong w»th the diayram below? 
It shows the percentage of eryth'ocvtes 
haemolysed in various concentrations of 
salt solution (Wintrobe's method) 


It -nfers that more than 100% 
ol the red cells survived at 
tosv concentrations- This is 
impossihle. 



030 040 050 

%Conc NaCl Sol. 


3.21 The trick in Frame 3.20 is technically Suppressing the rero 

called ex trapotetion lex tending the line Inflating rh« scale 

beyond the actual result*.! 

What other 2 tricks do you know? 
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3 22 Who! It ickt. hast: linen employed in this 
diaijiam showing .i patient s inernase in 
haemoglobin level ahr» therapy with 
the thug Ironical AM 3 



3.23 

Which are the 3 commonest tr-cks used to 

Suppressing the n*o 


illustrate uuuntiluuve data dishonestly? 

Inflating the scale 
Extrapolation. 

3?4 

What method* do you know tor 

Histoqrarns ami Frequency 


illii*ir»t <ng measurements.? 

polygons 

3 25 

Whiit methods do you know lor 

Pie diagrams 


illustrating counts 7 

Hai charts 

ProtKirtHKi.il bur Charts. 
Pictoqums 

326 

The differe-ce between a bar chart and a 



histogram n that the 

Histogram 


«* uvw! Whrrs data s measured and the 


ikit.i In the histogram the groups adjoin 
each other and the limes approximate 
«irh other whereas with the bar chart 

Bar chart, Qualitative 


the groups are usually . .. 

Separate 
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Ideally between 10 and 25 classes sho< W 
be iivkI m » histogram. How manv an* 
used here? 


18 



3.28 


Sketch the frequency polygon that would 

be constructed from the Histogram in th<* 
preceding Irame it would look like 
a curve 


>» 

o 

C 

*> 

o 

CT 

ft) 



The- diagram is the subject 
rrulter of the next chap tor 


3 29 


Practical Example 

Sketch the data Irom Frames 3 3 and 
3 5 on the same d aqram. 

Comment, 


Bab res of dial retie mothers 
sri*m to hr bigot'! 


ccntd overleaf 
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SUMMARY 

Belor* pn-jr-nting mcmurentantt the results are grouped The number in each 
claw is called the fr t *quency The ftequenr y in all the ctaws. is called the 
frequency distribution This grouping makes illustration easier, but the price 
in nomc lov. til r1rl.iil Ideally, 10 tu 25 different groups can be used 
* hi* frwjueney distribution is represented by a histogram or frequency 
polygon. When 2 or more frequency distributions arc superimposed it is 
better tn use 2 freauency polygons, 

The principle, for preventing all data .ire lull labelling, simplicity and honesty. 
The corm-onest tmi f.s with quantitative data are suppression of the /era, 
inhaling tire scale and extrepolation 


Data 


Qualitative 


Quantitative 


Group 


Pie 


Bar chart 


diagram 



Pnrtogram 


Proportional Histogram 
tier chart 


Frequenc 

polygon 
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Chapter 4 


THE NORMAL DISTRIBUTION 


INTRODUCTION 

In the Iasi tecliou we imagined the shape uf a curve concluded from a 
particular histoQram Most biological and madicat sources of quar.titative 
results cither Mimv (hot curve or can he modifier! f,nily sn that they do 
Because it occurs so commonly it « important to understand it I With nut .ts 

formula, that is'! 


4 1 A factor wh'ith varies is call's! a .... Variable. 


4 ? In the pure sciences such as Physics and 
Chemistry there is not *n much inherent 
variability as in Biology and Medicine. On* 1 
chemical cartoon atom is much tike another 
carbon atom but when they arc 
hrologically arranged the effects can range 
from ’stunning' to mediocre' aixi even 
beyond 

How tall arc you 7 Your height is_ 

4 3 I am 5 ft 3 in. Are you the same height 

«I am? Probably not 


4 4 If your height is different from mine. Neither of us so fa» as 

which of us is abnormal 7 height is concerned* 



4.6 Do you know more fieoplc over 6 ft tall 

than under 6 ft tall? No. 


4 7 Do you know more adults less than 5 ft 

tall than over 5 ft tall? No 
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4 JJ The maturity of adults are between 

5 ft 4 m and 5 ft 8 in toll. Compete the 
hit.UKjnim 



4.9. If sve take many more groups and 

sketch the curve for p«opl«'t height, it 
coutd look like this: 



Hus is called the wmw/ distribution. 

It Is shaped ike a . 

II applies to qua'irative/quAntii.ibvod.su 


4 10 A symmetrical curve it one with 2 sides 
ol the centre absolutely correvuunhng 
The curve of the normal distribution 
iy'rs not symmetrical. 



Height 



It is of ten drtcrilied in 
texthoo^s as being be'i-shaped 
Quantitative 
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4.11 Which of the following are symmetrical f 





A 12 Describe the normal curve. 


4 13 Relative to the base line a convex vutface 
is arched and a concave surface is hollow 



(a) and (c) are 

It is bell shaped and 
symmetrical 


A is 


... 


6 IS .. 


Concave Convex 


4 14 In this diagram indicate the pan which is 
convex and the parts which are concave 



Between B and C it is arches! 
or convex, otherwise it is 
hollow or concave 


4 15 Thr point where a convex section of a 
curve changes to a concave section is 
called a point of inflection. 

How many points of mllection has a 2 B and C »n the 

normal distribution curve? Iasi frame. 
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Is not 



4 1? Why* 

4 18 How many points ot inflection has Chjd } 






The curve remains convex 
at pomi D. 


2 only on his nose 



4 19 Thrs centipede has... points of 

inflection and is/»s not 5vmmctr<al. 



4 on each surface. 
Is not 



4 ?Q Fur a variable tu have a Uequuncy 

distribution like the normal distribution 
the majority minority have & measure 
near the middle and the majority/ 
minority havy mtarsurm no.tr the extreme. 

4 21 Is income ihntribiited like the normal 
rfistnbotion^ 


Majority. 

Minority 

No The maturity have a 
small vwtge and the minority 
(like doctors) a large wmge 
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However many variable? based on living 
objects can be approximately described ax 
being normally distributed Height is one 
example, write down 3 others. 


i 0 Weight Bladder Capacity 
Haemoglobin level, 
Examiriation marks usually, 
etc 



H»««*tr 

m'«'n 



>44< *4 **•»*•*!i•♦*- 


It is fairly hard to think of 
variables which are not 
approximately normally 
distributed 


Wry do you think the ftofmai 
distribution is so callad 7 


Berause it is th« distribution 
normally encountered 


24 Describe the normal curve. 


Symmetrical, bed shaped, 
2 point* of inflection 


SUMMARY 

Variation between individuals iso natural phenomenon, tn&nk qoodness 
The fact that most variables either follow the normal distribution directly 
or can easily be adiusted to do so is very useful and er’-aoles numbers to be 
used to answer theories The normal distribution is symmetrical and bell 
shaped and has 2 points ol induction white the rfiapc ol tht: curve change* 
Irom convex to concave or vice versa 
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Chapter 5 NOTATION 

'JTROOUCTION 



You are now in !*■ introduced to symbols which will save a (at of words 
later If your arithmetic is such th.it yen understand this chapter's surttmary 
you can skip the chflpir* 


N isosmI for llu* nuxntiei «>t result* 

VMmi is N in Frame 3.1? 12 


Usually X is used instead of an individual 

result What is the last X m Frame 3 I 7 170o/ 

It we hwr two results for eoch patient e.g. 
a height and a weight, we usually rail owe 
X and orve Y. 

X and Y then represent two factors which 

vary ot ate .. Vanahles. 



5 4 How many X\ are there in any set of 

results? N 


5 5 — is tvip-’.dl S in Greek It ts pronounced 

srgma' 

1 moans odd toyethtr all the results 

What does 1' (t .2 2,3) equal? 8 


IX moan-- arid -ogethei all tnc values of X 

It is pronounced . ...X. Sigma 

'.Vhm is lX for t-wc 5 fictitious 
haemoglobin levels? 

80. *00, 100. 110 120. 500 


Be'nw arc corresponding valuers uf X and 

Y fu r 5 itudr-nTr X is h mmoglnhin ivel 
and Y is mfell^encr quotient if you l-ke 
N in earn case jHiuals 

1Y equals ..... 

X 80 90, 100. 110. 120 

Y 80. 90, 100. 110 170 


5 

550 
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5.8 

VVKoi rio IX and lY equal in the 

N N 

previous frame? 

w - r • “ 

IV 550 

N 5 



59 

V X 

“ . i* the awagc of the X variable 

N 



Its symbol is X. called X bar. 

What is the result lY called? 

N 

The average ul Y tir V 
or Y bar 

5 10 

The average in statistical jargon is 
usually called the mean 



Y is the .ol all the Y 

Mean. 

5.11 

X »n Frame 5.6 - 1CX) 



What does y/ <*> equal? 

10 

5 12 

If 4 values of X are 2.4.6.8. which 
symbol equals 4? 

V 

M 

Z 


Which symbol equals 5 7 

X » 5 


Which symbol equals 20? 

IX- 20 

513 

When you see a capital sigma you do 

what 7 

Add the results together 


Sometimes we will want the results 
squared before adding. 

Wc then write I'fX J ) or I(Y*' 1 

If Y ,s 1.2,2.3, 

SlY 1 * *. 

U4+4+9 18 

You perform the tusk in 
brackets first; that is what 
the brackets mean. 

615 

IX X) means what? 

An individual result minus 
the mean 
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5.16 (X X) it called the efevration from the 
mean What is the last value ol Mu* 
deviation Irnm thr mean in Frame 5 h' 


— 


120 ion .?o 


5.17 It is j tule that vou pet form the task 
shown <n brackets first 

What IlX XI in Frame 5 6 ? 


5.18 Y in Frame 5.7 «s 11D. 

VAuit r» 1|Y Y) mi that frame? 


5,19 i.<X XI and. of courw XfY Y) 
always equals zero 

lf| ttqMU t mn m 

.6Qudb tfjf U 


( 20>H lQ?*iO>*f• 10) 
Mv20> 0 


The vum of the deviations 

from the mean 



520 


521 


In algnhM XY equals a value of X 
mu I tip i-nl hv ill corresponding value Y 
If X is 1 2.3 while tbs 3 corresponding 
values 'or v are 1 .'1,4 the 3 values of 

and £|XY| - . 


Remember. always per form the task in 
bracken f ir*si 


1x1-1 2x3 = 6, 

3x4* 12 

V«XYI 1.6*12 


19 



ff X .* 1,2.2.3. 

X equah j 

and liX - X> J equal*. _ <-1) , *l0| , *|0) i M+1l J •= 2 

Rememoec a negative number 
squared gives a positive answer 


5 22 If Y is 0.2.3,3 

Y equals__ 

and L(Y Yf* equals 


2 

6 
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5.28 This is all the rotation you need to know 
throughout this hook 

To revise it, 

where X equals 1, 2, 4, 5 
and V equals 0, 0, 2. 2 


IVJ in both cases 

tK)Ud(S - . iiiiiiin 

4 

lx 

equals . 

12 

Sv 


4 . 

X 

BCjUlilS ..,4 .-... » 4.. 

3 

Y 

i., M . 

1. 

I<X J ) 

OC|\J^)l*i . t «••••-•*•••*« 

46 

<lxr 

equals ... 

144 

rixY) 

equals .. ... 

18 

<1X1 (ZYt 

squats ... 

48 

Six - XI 


0 of course. 

E(Y Y) 2 


4, 


5 29 Practical Example 

Choose any 10 different values foi X for the table tiHnw li C. N 10), 
Usury ihnve numbers calculate in the spaces provided 

fa) IfX - XI 1 
N 1 

N-"1 


The two answers should l>e enua* 


contd. on opposite page 
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Chapter 6 


MEASURING THE MIDDLE 


INTRODUCTION 

Rjtner than talV about a sot of results In terms of all the individual values 
we can summarise the information 1 o do this we need lo be able to describe 

3 things 

1 the shape of the results ie.g the normal distribution considered in 

Chapter 

2. the mnktic uf this clivlritnilHi Icantulorrd in this Chapter I and. 

3. the dmfrfiv of variation | considered m rhn next Chapreri 

Describe the normal distribution It is symmetrical and bHI 


shaped with 2 points of 
inflection 


The middle of the normal distribution in 
the mi?.*r What is thn symbol lor 
the mean? 


Tha mean i?. the measure of the middle ol 
the distribution us h'I the results tpnri to 
lie about the mea’’ Is ihe 'range' of 
result;, .< measure of the centre? 


No 


We arc cp no to discuss 3 meuMires 
indicating the nmre. One is the mode, 
one is the madian, ,->nd the other is the 


Mean. 


If result:, urn listed in order nf tlru they 
o>> i-idled an army An examination list in 
alphabetical order is/is. not an juray 


Is nut unless |he Ajjrnns 
arc at the roo of the class 
and the A'okanakas at 
the bottom, etc: • 


Arc tha resuitK given in Frame 5 6 
an an ay ? 



Do the return need to be arrayed before 
the -iiMii rs calculated? 




37 

68 

The median is the middle value in an 

array 

What is the median in Frame D.G 

1D0 

69 

The results quoted in Frame 5 7 are 

related here 



X 80. 90. 100, 110 120. 

Y 80. 90, 100. 110, 170 



What are the 2 values uf the median* 

100 - the same in earn 

6 10 

The extreme value, 170 does not affect 
the median Are the means in both 
distributions the same? 

No 

X is 100 

V i s no 

6.11 

Not only do the extreme values affect the 
mean but an extreme value has a greater ■ 
lesser effect theft one near to the middle 

Greater 

6 12 

When wc nrr interested »n whether cases 
fall in the uoper or lower half of the 
distribution, and not particularly >n how 
far they are from the central point, we 
use the median/ths mean. 

The median 

6 13 

The mean uses *1 the information 
available. Does the median* 

No, It 4 not such u reliable 
measure. The tesult 170 
dons not .iflcct it 

6 14 

Which is easier to calculate, thtt median 
or the mean? 

The median, especially if 
the results are aln inly 
arrayed. 

6 15 

Choosing between the median and the 
mean, where applicable. we know - 



The .. »s more reliable and »s, 

in fact. usually used 

Mean 


The.. is easier to calculate 

than the. 

Median. 

Mean 


The.is not affected by 

extreme values. 

Mediae. 


To calculate the.you need to 

array the results first. 

Median 
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6.16 Are these sedimentation rates on array? 
7 13. 11. 15. 10. 70 


6 17 What is the value of N in the 
previous frame? 


6 18 So tar. when wtt have calculated the 
median. N has been an odd number so 
that there has only been one middle 
value 

Is N odd or even in Frame 6.16 ? 


6.19 When N is i*vrn the median *% the avuragr; 
or mean of the middle two values 
What rs the median in Frame 6.1G 7 
IHmt: array the results first.) 



X is H. 1.1.2.2.3.4.41 above 
What is the mean value? 

What is the value of the median? 


0.21 The modtf is the value which occun. 

most frequently the most fa^’-onable 
numbw What is the mode in the 

last frame? 


No. 


6 


Even 


12 The average of 
11 and 13 


The distribution is 
1.1,1,2,2.3,4,4 


Menn - ^ ?% 

Median 2 


1 
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B.22 In this diagram the mode, median and 
mean are/are not ihe same. 



Are 

The distribution «s 
0.1 1,1,2,2,2.2,2.3,3,3.4 



623 


What is the value of the median, the mode 
and the mc-a- in Ihls normal distribution? 


sw 


120 

All these measures of the 
middle are equal In the 
normal distribution 


6.24 The normal distribution is bell-shaped 
and symmetrical about the 

.... and. Mean, median, mode 


6.25 The.and .. Mode and Median. 

are easier to calculate than the. 

if the results arc arrayed 


6 26 What is the mode? The value occurring most 

frequently the most 
fashionable - the peak 
on a graph, 


6.27 The camel can be said to be bimodaH > 
Why? 


When it has ? mrxim 
or bumps. 
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6.Z9 The mode Is the least valuable measure 
of tne middle Musi sets ol iesulis follow 
the normal distribution and the mean is 
the most accurate measure. However, 
sometimes in research one comes upon ;i 
distribution with 2 modes This is a 
useful sign that 2 groups of very 
dissimilar people are mi nod together, 
(statistically speaking not an orgy II — 
mat tna group is probably heterogeneous 
for example. 2 chid groups of anaemias 
are-macrocytic |w«fh bu| i.vHm and 
m»crocytic (with small cellsl 
Sketch cell si«* m anaomiai 



cells cells cells 

Dnirtbulton of celt xiie in all taux. 

Of »n9»mia 



Microcytic 

anaemias 


Really 2 normal distributions 
with some overlap. 


6 30 The most rel able value of the renitre ts 

the . It rs'rs not the only Moan 

measure of the mulrtlf which uses all the I*, 

in form at inn 
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Unless we wish to know the most 
typical value. In which case we would use 

the .,.; or we wish to know Mode 

whether case* tall In the top halt or the 

lower half when the.»s used. Median 

it is best to calculate the . Mean. 


In this array : 


1.2.3.3.4.5.10.12. 


3 is the value of the. 

Mode 

5 is the value of the.. 

Mean. 

and .is the value of the 

3V. 


Median. 


SUMMARY 

The 3 most useful measures of the centre are the mean, the median and 
the mode 

The mean, or average, has the symbol X and is the most reliable measure, 

It is markedly affected by extreme values. 

The median and mode are calculated after the results are placed in order 
of sire in an array. 

The median is the middle value in an array (or the average of the middle 2 
values if N is even \ It is usually used if we are particularly interested in 
whether cases fall in the upper or the lower half of the distribution. 

The mode is easy to distinguish, being the value which occurs most 
frequently. If a frequency distribution is bimodal it usually means that the 
group « not homogeneous and 2 very different groups are mixed together 
In the normal distribution the mean, the median and the mode lalt in the 
same places 
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Chapter 7 


MEASURING THE VARIATION 


INTRODUCTION 

So far we have summarised a set of results by describinq the shape and centre 
In this chapter wc discuss how to measure va» ialion 


7 1 Below we have 2 normal distributions, 
oach with thp some mean. I iet of 
results varies mnri- than the other - 
it has a bigger scatter Is rt distribution 
A or distribution B* 



Distribution A The results 
in distribution B cluster 
more closely about the 
mean. A has more variation 
between results. 


7.2 Tbtn* were 3 measures of the centm 
discussed VVh»t were they? 


The mean, the mode and 
the median 


7.3 Similarly there are 3 measures of 
variation. Thn first is the range 
The range » the difference between the 
largest and smallest result Does it use 
all the uvailobte information 1 No 


7 4 What is the range in Frame b 67 


120 80 40 


7 5 Complete the gaps 

1,2, 3.3 5, 9 is a distribution where 3 

« the .... and the . Median Mode 

and where 8 is the. R anqe 


’he mr.in t reliable because »t uses all 
the information Is the ra ge u reliable 
measure of variation? 


No. it does not use nil the 
information It is nr finally 
on.'y user I whim the median 
is used to measure the cenlre 
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7.8 In the previous frame draw a horizontal 
line joining the points of inflection This 
length can be used as a -Measure of 
variation If the results are very scattered 
the line is longer-shorter than if they 
cluster around the mean, 


7 9 In fact the length of this horizontal line 
from a point of inflection to the mean is 
called the standard deviation It is a very 
useful measure of variation Draw in the 
standard deviation below. 



7 10 The standard deviation is often given the 
symbol s Wc need to calculate this value 
We know it does not (qml llX XI 
Why? 


7 11 


TW«.“ IX *! 

N 


is also equal to 0. 


so cannot be used as a measure of 
variation either In fact, ? has a nastier 
formula. 



This value always equals 0. 


contri overleaf 
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7.11 contd. 

j llX- X? 

1 = J N — 1 

is the other measure o♦ variation and 
h called thi> variance. „ 2 

Give the formula for the variance —I X X)* 

r * - 1 


7 12 j L'fX-X 2 ) 

N - 1 

You have met the L'{X - )?) 2 part before 

at the end of Chapter 5 and you know The number 

that N is what 7 of result*, 




7 13 Where X h 0, 12,2.5 

X m. 

lix-xr = . 


Itx 


?■» 

XP - 4r I *0*0»9 

- 14 


7 14 . Where X is0,1,2, 2.5. 

ilX X> z 14 and s 2 - 



7 15 


<X — X) is called ... 




I worth. | 


Tne devotion from the mean. 


7 16 


State the formula, s 2 ~* X Xl 
in wot ds N — 1 


7 17 


/SIX - X) 2 
v ' N 1 


is thr formula for s. 


which n called what? 


The variance is the sum of 
the squarr-s nf the 
deviations horn the mean 
divided by one less than 
the number of results 


The standard deviation, 
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7.18 


7 19 


7.20 


721 


7 23 


Complete this calculation of the 
standard deviation from Frame 5.6 
by deciding the values of a, b, c. d. s' 
and v (X - 1001. 


X 

(X - X) 

<X -X> 2 

80 

-20 

400 

90 

(a) 

(b) 

100 

0 

0 

110 

♦10 

100 

120 

+20 

400 


EfX Xl J 

llX- Xl 2 
N - 1 


= <c) 

. |£!! - * 2 

<d) 


It is easier to calculate the range than 
the standard deviation. Why is the 
standard deviation a better measure? 


What is the symbo* for the 
standard deviation and its formula? 


s 2 is the. 

What is its formula? 


7 22 If X.s 1. 1. 2.3.3 


What is the range? 

What is X ? 

What is the value of the variance? 
What is the value of the standard 
deviation? 


If X isstifi 1, 1,2. 3.3 
WhatisllX 2 )? 

What is | ZX1 2 ? 


a 

b 

c 

d 


-10 
100 
1000 
N - 1 


i 1 260 


s = v 250 


It uses ail the information 


/ 1'ix -xr 

v / N - 1 


Variance. 

, HX XI 2 

s' - 

N - 1 


Range 2 
X - 2 

, n 2 »i 2 io<i 2 *i 2 » 

---- * 1 

S = y/T = 1 


Z<X 2 I - 24. 

(ZX1 2 I0 2 - too 
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idX 2 ) 


WiHt diKiS 


<ix> J 

N 


N 1 


*qi»i' 



4 


1. 




7 25 .‘ When X is 1 I 2,3,3 


1<X X} 2 

N \ 


1 {from Frame 7 22) 


•mi) 


1(X* 


{1X1 2 
N 


N 1 


I (from Frame 
7 24) 


In fiict, xvKatcVpr die distribution 
it « rfAvayi tnthmeticaity true that 

,yv>2 

£{X - X) 2 StX 2 ) ~ 


N - 1 


N - 1 



is a .-other 


formula for on*C4jtat- . .. 


s, the standard deviation 
In fact, yon 4(1 ww this to 
he so in the practical 
cxair-pfe in Frame 5 29 


7 26 


\Vh*:h o< those if any. is a ccKrcct 
formula for the standard deviation? 


I a I 

0»> 


(c) 


Aix 7 : 

f N 1 




(Ixr 


I<x 2 i 

N 


N - 1 


noufrf on oppnutx; page 
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7.26 


contd 

<d> 


<«> 


<0 

<9? 



v r N 1 

/ax x > 2 

x/ N 1 


<d) and (9] 

{al has the summation sign 
inside thp bracket 
<b) has the wrong denominator. 
<<:) fu* both brackets in 
the wrong place 
<e) omits ihn square 
(I) omits the square 

root wtr* 


7 27 


One formula for the standard 
dev 131 ion does not need to have the 
mean cafculutwt first 
What is n? 


/SIX') 


V N 


ilx > 2 

N 


7 28 


If the mean is a whole number, 1! is 
r.rvifT to use the formula 


/SfX - X) 2 
V N - 1 


for s 


If X was calculated to he 2.3816 it 
would be eawer to use which formula? 


The line not requiring 
the calculation of 
deviations from the mean. 




i.e. s. 


»X 2 ) 

~N~ 


(i:x) 2 

N 


29 if X is 2. 3. 3 4. 5 
X - 3.4 

Complete the following table 


X 

X - x 

X 

1 

X I 

X 2 

2 

-1.4 


4 

3 

-0.4 

0 16 

9 

3 

-04 

0.16 


4 

5 

♦ 1.6 

256 

16 

rx= 

2(X-X)=0 

£fX-X) 2 = 

Ix : 


X 

X X 

{X X) 2 

X 2 

2 

1 4 

1 96 

4 

3 

0.4 

0 16 

9 

3 

0 4 

0 16 

9 

4 

+06 

0 36 

16 

5 

<1 6 

2 56 

25 

17 

0 

5.20 

63 
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£<X XV 

7 30 the variance —-- 


;tnd lh« vat lance w\X * I 

N - 1 

7 31 Give ihe formula for the standard 

deviation without using the mean. 


7.32 The standard deviation for I Q. is about 
15 What n. the variance for t 0. 


7.33 A frequency attribution in a journal 
looks fikc thh Drsrribo <t a* fully 
i»i poicubie. 



70 100 110 130 

Haemoglobin 

7.34 l ik« thn which measures 

the -ruddle ol a <1 stribut-on, the standard 
deviation and variance use .ill the data 
They are more reliable measures, than the 

and. which 

measures the middle md the .... . 

v.h ch measures v<*r-at>on. 


7.35 State 2 formula* for the variance 


5 20 

4 


1.30 


T ha 
G3 




1 30 




Six’) 


Atimit 225 


Shofx* normal 

Mean/Median/Mode 100 

Standard deviation 10 

Variance 100 
Ranije 60 



Menu. 


Median and Mode 
flanijit. 


llX Xf : 
N I 



N * 1 
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7.36 In an examination you would probably 
be given any formulae which aie hard* 
than these. However, the standard 
deviation and variance are so important 
that you could he exported to rememhei 
them, Can you? Yes. I hope. 


7 37 If the value of X is not a whole oumher. 
which formula would you use for the 
variance? 


KXM 


N 


(Sxi» 

N 

1 


7 38 If Y is the variable rather than X, what v , (1.Y)* 

would be the formula for the standard -I v I ^ 

deviation without using Y ? 

N 1 


7.39 One measure of variation does not use. 
all the information. It ts the 

M*l«4k»M(l | which i$ used when the Range 

..(also unreliable! is used as Med^n 

the measure for the centre 


7 40 Draw a normal distribution with mean GO 
standard deviation 4 and range 30 


7 41 In journals vou often see written: 

the mean t the standard deviation 
For example 100 ± IS for 1.0 . seen 
in an article indicates what* 



That the mean I Q is 100 
and Ihe standard deviation 
is 15. 
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7 42 Practical Example 

Results m reality •«*? not so arithmetically convenient as in this programme 
They are usually more like those in Frames 3 1 arvd 3.5 Calculate the mean 
and standard deviation in Frame 3.1 ami Frame 3,b. We will use the answers 
in Chapter 1 7 when we test to see whether habies of diabetic mothers arc 
stgtvficantiy tugger than those of normal mothers, so your efforts now will 
Ik; put to practical use later In fact arithmetically you could perform 
•.iijii.'icance tests based or means already, but I want you to understand what 
you are doing because the tests are then much more interesting. 




SUMMARY 


Measures of van; turn arc iisml lu indicate the spread of the curve. The 
variance s‘\ and its square root, the standard deviation, s, are the measures 
of choice: the range is occasionally used. 

The range of the results is easy rc calculate tx. i has the same disadvantage as 
the median oixl mode; that it »s not a reliable measurement. It is used as a 
mr.rajrir ol variation when The median is used as a measure of the centre. 


The range is the difference between the highest anti lowest values. 
The lurmuloe to calculate the variance are 


Six 3 * 


;Lxi ; 

N 


N - 1 


or 


IfX XJ 3 

~ N - 1 


which .ire numerically identical The standard deviation in the normal 
distribution is die horizontal measurement from the mean to a point of 
inflection and equals rh* square root of these formulae 
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Chapter 8 


WHAT CORRELATION MEANS 


INTRODUCTION 

So la* we have leaded to descnhe qualitative data in terms of ratios and rates 
etc. arvd quantitative data in terms of the shape, m-ddle and variation ol the 
frequency distribution?, Whim ? nr more different variables are measured on 
the same people to see whether one is associated with the other la common 
pract'C-e is medical research) we describe the results in terms of correlation 


Correlation does not moan causation The 
probability of getting lung cancer and the 
number of cigarettes smoked have been 
shown to be correlated Does this mean 
smoking causes lung cancer? 

No No more than ung cancer 
causes smoking It does show, 
though, tn»t there »s an 
association between smoking 
ond lung cancer 

Correlation means which of the following 7 

la) association. 

(b! causation. 

lc) living with relatives 1 

Id} tied together 

le> acting ihe same way 

(fti 

Height and weight are cor related 

Increase in weight is associated with 
increase.'decrease in height. 

Increase. 

When an increas* in ornt variable is 
associated with an increase in another, 
correlation is said to be positive A 
decrisave in one variable associated with 
an increase m another is negative 
correlation $»«; in shoe and size in hut 
are corrected how? 

Positively 

T ime spent m bed and time spent in 
studying are..corrdateo. 

Negatively unless yo>. 
study in bed! 

How are the I Q. of parent and 1 0 of 

Child correlated 7 

Positively 








Only 2 var t-nlcs .ire considered 
simuUanooudY » n term* Of correlation 
Whin Ihts ts the cme the information 
itmv be represented On a ’scatter diagram 
What arc the 2 vim tables in this scatter 
diagram 7 


II 


b) 


H—I—l—h- ■ 


-5-4 -3 -2 -I 


III 


4 5 

4^ 

43 
+2 
♦ I 

4 


-I 


B I t a 

,4f 4 2 43 4445" 


-2 

• 

• -j • 

Q) 

-.4 

• • 

--5 

IV 


Each dot in the last home ts vdiwu a value 
of X corresponds to a value of V At point 
1st X is -5while Y is.. r .. 3 


At point Ibl .... ..is positive and 

■ •I 1 ... is HG^dt Vi! 


In the quadrant marked 11, X is. 

and V is. 


In quadrant I. X and Y are botn positive 
in comparison with quartet III, where 
X and Y art- outli neiptive When most 
points he in quadrants I and III the 
correlation is [* ative>'negative. 


X and Y 


4 

Y 

X 

- ve. 

-ve 

Pnvtivn 


Which quadrants nvoultf contain most 
dot. in iiei>iiive correlation? 


II and IV 
















816 


In Frame 8.7 correlation is._... 


8 14 When no correlation ex sts the dots in the 

s___d_-Occur 

roughly the same amount in all quadrants 
(like a non specific rasbf 


8.15 Draw the axes in this scatter diaiyam. 



Is correlation positive ik nei>nivc? 


The axes are only guide Imes and need not 
he shown. Manarjc without them and 
complete the »'ol lowing: 


53 


-ve. 


Scatter riuiqram. 


II 

• 

1 

• 

• 

• 

• 

• 

• • • 

III • 

• 

IV 


Positive 



la I is *11 •««>»«. • correlation lal -ve. 

(bl is.correlation lb) -ve 

Icl is. correlation Icl no 
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5-1 


8 1? So f.u w* have learned 1 u descr be the 
direction or the viqn of the correlation 
Correlation aivo has a descriptive fay let 
its siie It maximum when « value 
of X is 'pecir.r to a smgic value of V 
<i e a sii.iiuhi Imel The more the points 
.mi* scattered about an imagmarv line the 
less the correlation becomes, until when 
the points .m* scattered all over the place 
there n i>» Correlation at all it is not the 
stupe that determines tnr dnjree of 
correlation but how closely the oamts 
are to a strj>ght me 

Correlation isAsnot greater in (a) than (hi 



8 18 Correction is man mum jn which of the 
follow>rvg 


M lal only? 

(») lb) only? 

(in) both lal and |bl? 



Is 


Both (aJ and (to), as in both 
thr points lie exactly on a 
straight Imt A state of affatrs| 
vei y rarely met in realty. 


f hi* slope 'tv’ll is whai is 
meant by regression, but we 
do not discuss repression m 
th>s book 


8 19 Pol the. informal on into a scatter diogrom 
and answer the guest ion*. 


X Haight of 
father 

Y Heigh; of 
eMail Mhiir ton 

£ 6 - 

(&' 

fi 8 " 

69 ' 

69 ' 

)V 

7 f 

W 

73 ’ 

71 " 


con Id on opposite page 
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SUMMARY 

Cu'relation means association It can be positive. negative w nonexistent 
Positive > orreiation is vvtiere the va»tables tend to inc«e«»e in size together. 
Where 2 variables are involved a scatter diagram may be used to represent 
the datj In positive correction the dots lend to id in the upper rn|ht 
hand jnd lower left hand quadrant* whne m negative correlation the dots 
ii*tio tn lie m the other ? quadrants There «;. no preponderance of dots in 
my quadrant whiff* correlation does not exist The magnitude of correlation 
is indicated by ho»v closely the dots approximate to a straii/tt line <i e., how 
narrow rhr scatter is about an imaginary line* and not by their slope. 
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Chapter 9 


MEASURING CORRELATION 


INTRODUCTION 

You will learn here about 2 ways of measuring correlation You would 
usually f*ot be expected to remember these formulae, but having been given 
them you would be exported to be able to use thmi 


One measure of correlation is called the 
Pearson correlation coefficient and its 
symbol Is 'r ’ 



■* - it 

X 


You would expect r to equal 
here. 


0 


How arc most variables distributed? 


Normally. 


In order to calculate V and for the value 
to he mcanrpjf^u/ both variables involved 

must be distributed normally May r r be Yn Both arc distributed 

calculated between height and weight* normally 


Should V between I Q and income be 
calculated* Their frequency distributions 
are given below 


Nu. 

Income dues nut to'low the 
normal distribution shape. 


Frequency Frequency 



10 . 


Income 
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9.5 


9.6 


98 


Complete thm diagram with r its maximum 
value but nr-qative,, by adding 2 dots 



i is .. .. thou its maximum 

v.ilue here ami it« sign * 



9 ? Givi; a formula lor » without looking 
Mar k to Chaptei 7 if fioviihln 


The denominator lot r is 
L’(X - X| ; 


V N - 1 

which « what? 


‘ llY Y> 


v/ N 1 


Writ* Sjj Sy without using :he means 



In a straight tine 


||*W 

•VI? 


1(X XI* 
N -'I 


llX J )— 
N- 1 


ilxr 

N 




The staoiiaid deviation of x 
multiplied l>v the standard 
deviation of y 
Symbolically $„ \y 




,T\ 


h-i 
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9 II 


9.12 


9 13 


9 14 


9 10 Is I(X - X> : thesame as 
£(X - X) |X - XI ’ 


The numerator for r' is 
HX-XKY Y1 
' N - 1 

Comment 


As 


ISX> : 

ax - xp ! " lx n 


N - 1 

Do vo 1 - think 


N - 1 


v (xy) _ <sx» (£v> 

2fX XI <Y Y) N 


N 1 


N 1 


r r 


cowar iance of X and Y 




So, without using means, las very rarefy 
are both X and 7 whole numbers) 

r = f 


In fact the N 1 term in the numerat or 
can be canc*lli«d with th*\ ft-1 iyli 1 
in the denominator, so that 


N 


/ 


I<X*'| - 


llX) J 

N 


r 


y- 


IY 7 > — 


li.Y| J 

N 


To what does N refer? 


59 


Yes 


It is very similar to the 
formula for the variance 
It is. incidervallv Wled 
the covariance for X and Y, 


It does 


ilAY 


n 0>i 
— 


ll «‘r - 




ll*‘* • 




ti -1 


The number of X results, 
or the number of Y results, 
«.« fhp number Ol pairs of 
results (points on the scattpr 
diagram I. 
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q TfttS formula looks iwtol hlil Wi'C W II U50 

it now to provr that (t is rot too hard to 
uw* I For youi coi^vonirnr.r it Hfls Ix't'ii 
Kpioducpd on a pull out card .it the 
tv»ti«. of the proijramrne ! 

To at this lot mula vo » ntNiil.need nut 

know the rrwMns ot X jnd V Nned not 


9.16 We will lh' thi: lorinul.} to Ciiloul.rti' 
the value lor r In this scatt>‘r vluiyr jii 


3 n t 

" 2 1 

1 

0 -.- 

0 2 4 

• 

We expect fieri* the uynuf *r to be 

iwd V to erioAl its maximum value ’ ue 









T r aristae these 3 fesults to complete 
the Ut>L be-ow. 


. 


al - 2 

01 - 1 

cl - 3, the number 

of poirs 
pi results. 


„|TI. _| 


d J.IY*), 


jorr.plete the table 
in tipjinvfrt pj»qp 
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9 19 


9 20 


9 18 contd. 


<KI 


It! 

fV , | 

IXV> 

0 

0 

t 


0 

7 

Ul 

f 

4 

4 

1 

N 

i 

t 

to 

ZfKI •« 

lix’. • ra 

llVI 

'll 1 ilV'i li 

i/xr. |i 


Using the formula in the pu lout and the 
totals you have already calculated i.e. 


N 3 3 

ZfX) * 6 

I<Y) - 6 


Z(X Y) = 16 
Z(X*I = 20 
Z(Y J I - 14 


what is the value of the numerator tn 
calculating r ? 


What is the value of the denommator in 

calculating r ? 


9.21 The maximum value foe r 


9 22 If you ever calculated t to be 5. what 
would be your conclusion ’ 


61 


14) 

lb> 

(c) 

(d) 
le) 
<fl 


= 4 
= 1 
- 12 
= 6 
- 14 
= 16 


6x6 


16 


-16-12-4 


> i.t 

v / ™ t 


N i?0 — »_•* . 

\ !l • v 


Z 7 * 


v Tm- i?, 
V T£' 4 


.1 


• 1 <trom frames 19.19 
4 tt/xl 19 201 


You had made an 
arithmetical error, 
because 1 is its maximum value 
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9.23 


9 24 


9 ?& 


9 26 



X 


Th* Pearson correlation coefficient has 
a sign and a nurrw«:4l vatu* 


What do these signify? 


Guess which of the following 6 values 
fur V rs correct here 


0.1, -OS. -1.0. 

-0.9, 0 ^ 0.9 



You can use The formula to calculate V 
ir the lost frame by completing 
this table. 


on 



iV 4 l 

|XV> 



4 





4 



1 





1 





1 


1 





e 




32V < 

lrv i • 




N - 


The lowest possible value 


The sign vgmfies 
the d*rw:lion of Ire of the 
points and the numerical 
value Signifies how ciosHy 
the poults he to a straight 
line 


09 


i(X> 11 KYI* 13 
1(X : I«31 Iiv’l 41 
itXYl 13. N b 

co/'.'O on vptxwte pagt' 
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9.30 Here are thi* fictitious 2nd M B. marks 
for 5 candidates. fbi was bottom in 
Anatomy ft*.d was ranked 5th or 5 

(c) is ranked .. ... in jnatomy 3rd or 3. 


9.31 


Candidate 

ttarft in 

At\ ff tv* 

Mark m 
Phyvology 

M 

HO 

TO 

b 

■to 

iiO 



00 


u 

36 

• 

46 

SO 


Complete this table tor 
the last frame. 

the ranks from 



CostdkcAiff R+nk in 

Rank m 



Anatomy 

Physiotow 



fa) 1 


1 

1 


2* 

5 

2% 

let 3 

2* 

3 

2% 

Id) 


2 

5 

M 

4 

4 

4 

Total IS 

Total 16 











9 32 When 2 candidates have the same score, 
what hftppi'rrv to their ranking? 

I tb) and Ict m Physiology m rhn 
last Irnrri'l 


They each get the iitittimat 
ic.il average of the rank imp 
they would have luken if 
thorn had been a slight 
rlitferenn 1 between them This 
keepi the totals in »ho ranting 
column* th • vime e g. »f there 
is i uint too both rani. I . the 
average of 1st ar»d 2'»d 
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9.33 Had <e) got 60 inttnad of 50 for 

Physiology in Frame 9.30 what would 
his rank have been? 


b, c and e would, had there 
been a slight different*, 
have been ?nrl. 3rd, .mtl 4th, 
As it tii they now al' rank 
the average 


2*3-4 

3 


3rd or 3 


The sum of the ran** ibll 
equals 15 Notice that after 
these joint 3rd's comes the 
5th 


9.34 D is the difference between a candidate* 
7 ranking*. N is the number of pairs of 
results (candidates!. 

Use the formula for p given below to 
complete the calculation for the 2nd 
M.B results in Frame 9 30, 


As an arithmetical check 
yon should always linil Ileal 
besides both ranks summing 
to the same totals, IfD I also 
equals zero 


CancfhJatt 

Rank in 
Anatomy 

Rank in 
Ptyuoiogy 

D 


tat 

1 

1 

0 

Im| 

tbl 

5 

2h 

2% 

6’i 

(c> 

3 

2% 

(!) 

lUlt 

<dl 

2 

5 

-3 

9 

<•) 

4 

4 

0 

0 


S 15 

1 15 SO) 

r 0 

1(0 


- <*>l (lit! ^ 

- (ivI - 2lD a >-lS5fc 


N = ? 

6X(0 2 > 

P " N|N J -11 

- 1 - > 


N 

P 


5 

, 6 x 15*'., 

1 &W 

1 - 0 775 
-0 225 


9 35 p is usually usco il no real score can tje 
assigned but orders of preference can be 
given, e.g. 

2 Surgeons discuss rhe various operations 

for gal 1 stonm They .vr** unable to give 

an actual numerical value for the 

relative efficiency of the operations so 

they l»»them and calculate ........ rank p 
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9 36 Tbmn are the results What is the value 

of p? 


v 


LMMiWI 

fat a 

2i*J ii.|nv< 4 

O 

a* 

A 





a 

2 

* 



c 

i 





4 

1 




T M» ■ HI 

*AU| to 

1(01 - 

lio’c 


N = 
P * 


N * 4 

I<D J > = 18 

1 1 Kit ice as .» check «IDI 
always - 0 1 


p = 1 - 
p- 08 


6x 1 8 
4 x lb 


9 37 A 3rd suroeon thinks operation B is the 
best and that the 3 others have equal 
merit What rank va'im would ho give 
the operations 

A, 8= C- D- Jl? 

(Check That th« tank tota 15 the same C 3. 

os for the other surgeons \ D * 3 


9.38 The range of p is the mmc as r 

The maximum value of p is . 

With no co> relation pis ...... 

Like r, phasa ....... and a.. ..value 


♦ 1 
0 

sign, numerical 


^ 39 .is not so accurate as_ for 

measui ng correlation as it does nut take 
Into account the actua' uhiamed result 


p; r 

It is used when ranks are the 
only measures available or 
when the results am not 
normally distributed. 


9 40 p is the Ra-'k Order Corielation 


Coefficient described by . . r was Spearman 

described by. Pearson. 
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9 4 • Which correlation coefficient is easier to 

calculate? 


9 42 To calculate these correlation coefficients 

you need.variables measured 

on a group of. .people When r 

is used both variables must tie distributed 

._., it IS the more accurate 

measure as the results themselves are used 
Sometimes It ts only possible to state a 

preference m which case. is 

calculated 

9 43 Correlation coefficients of 1 rarely occur 
in biology and medicine. One of the 
nearest is that between live wcujht and 
warm dressed weight of poultry where 
r - +0.98. 

Incidentally, the scatter diagram below 
represents r ecjual to + 6. which will help 
to give you some idea of the correlation 
coefficient si-re. 


Sisters 

height 


Brothers height 

9 44 Practical Example 

Below 

X signifies erythrocyte sed mentation rate 
V signifies the nurntwr of leucocytes in 

thousands 

Both may he thought to he distributed 
normally 

Us«nq the resu ts 

111 Draw a scatter diagram 
121 Guess the value of the correlation 
coefficient fiom the scatter diagram. 
131 Calculate r; 

14 1 Calculate p. 



Spearman's. 

2 

N 

riormaily 

P 


contd oserteBt 















6 B 


944 corid 


X 1 
7 
3 
5 
b 
5 
7 
7 
9 
10 



(t) Scrtftw d mgram 


V 


1? 

11 

to: 

9 

8 

7 

6 

5! 

,A 

II 

2 

1 

0_ 

0 1 


? 


3456789 10 

X 




<2.i Your estimate ot th*» correlation coefficient - 
13? Calculation of r. 


co)>id, an opposite pagv 
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944 

contd. 





X 


Y X 2 Y 2 

XY 



1 


3 




2 


5 




3 


5 




5 


2 




5 


4 




5 


6 




7 


7 




7 


10 




9 


8 




10 


12 




£x- 


£Y- I(X J )- I(Y 2 »- 

HXY>- 


N 




S|XY> 

r = 

11X1 <IYI 

N 







I-*)' * /v7 y li 

N V w(Y ' N 


(4) Calculation of p 



X 

Y 

Ranked X Ranked Y D 

° ! ! 



1 

3 


- ■ -i 



2 

& 





3 

& 





5 

2 





5 

4 





5 

6 





7 

7 





7 

10 





9 

8 





10 

12 








HD 2 > 




contd. overleaf 
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9*14 contd. 



iChurk thdt both rank* sum to 55 and llDl 0) 

N * ^ . 

, 6 2:(0'> 


Does ' appro ximatoiy erjual p* 


SUMMARY 

The 2 movi fr«t*»enify used correlation coefficients are Pearson's and 

SfXtiW man's 

Pearson's correlation coefficient is usrd when both vjnjbles are normally 
distributed. It i* symbolised by Y where 


l£XI fEYI 

N 


SlXV 


t - 



\N is the number of pairs o* results) 

Spearman -correlation coeMlc<m: is used when either variable is not normally 
distributed. as wed .r.'A ' en only r inks are .wuilabla. It is not sucb an accurate 
measure as Pearson's and is symbolised by p (rhol where 





and D is the difference between rankings. 

Both r and p range from »1 to -1 They have siqn and maqmtude 
• t signifies maximum positive oonclatior 1. maximum negative correlation, 
and 0 means no correlation at all 
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Part II Ideas To Improve The Value Of Numbers 
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Chapter 10 POPULATIONS AND SAMPLES 


INTRODUCTION 


If yi if.w! f> i .:f th» caily »ol urrwsol the Journal uf the Royal Statistical 
So ely I rln t y. ii nulri hr* smpii'ed maybeviewon If i all In* surprised that 
you were read uni .my mu-edljiut stain! •£$! In the 1S3CTsiesu«»fdv.voikers 
aimro to uv cm tie nine popnbttiow. thea task was usually impossibly hard 
irvd ' rii «v ■ k suffered accordingly Today’s research workers would consider 
i>'’v I" • t.*mple, it .nit pnpul.it ion and would draw their inference* hom 
this Asyujmay ,iqini> tln ,r ..* to popuiiJtinn transplants are at best 

lui. .ifiJuv'. ’ii .1 iut ol pntiiilwJ pit tolls equally as ha 2 arrimj\ I am told, 

- i m h ” a! 1 I jlxto- I use to .dually proposin')' In medicine rt is 
nor i oiHii) to lira nti- i patient. you have In assess. the underlying condition 
I iLiUMi it i nor t-riuuy to describe the results i the swt’ple, you have 
tc he able to lsv-A the worth of t»u particular sample. 


10 1 A population an entire group about 
which some spued re information is 
required n» recorded What rt, the 

population In Frame 1 Q? All doctors in Country X. 


10 2 The population is of prune importance 
n, it is the subject of an experiment. It 
muW h«' hjlly define so that those to be 
included mil excluded areclsatty stated 
For «*ampl*» i the last frame you Would 
cud • o know whether those doctors nr® 
included wno are lotired, pail time, or on 
Icviue iv who haw** in hict li ft that 
country v.h I r -m.ii ung on the register 
Do yOu> 


No 

All dfcclorV in th.il country 
»S nut a fully defined 
population 


10 3 hi the popjljtrac 'your mrdcnl funnily’ 
you Wniilrl jnclutK which ur the following 

Teaching stall excluding part-time. 
Teaching waff including pert time 

3) Th« medical hbruiuut 

4) I h- mrokral students 

5J The cleaner* in ttin mndrcel school 

fil I he bodies in the diva* tmg room. 




Its up tu you I would include 
II Imt am not vrry tHiuid 
minded Vou» m*shcal 
faculty is not a fully 
dot tried population until 
tt'i lie .«il perfectly ckgr 
won tn include 


Page 79 of 223 








73 


A statistical population need not be 
made up of people. We can have 
populations o< birthweights, haemogtoom 


levels or blood ccfls SO long as the 

population is what 7 

Fully defined 

A 4 wmp/e rs anv part ol the fully 
defined population A syringefull of your 
blood taken now is it sample ol what 

population? 

AH your l>lnr>d iri Circulation 
at the moment 

Sometimes. as above, a sample is the only 
means we have ol inferring about a 
population Sampling is also slower' 
quicker and cheaper ’dearer than thu 
complete enumeration of the population 

Quicker. cheaper 

Any inference* from a sample refer only 
to the particular popu at ion defined 

Amorig a sample of patients in your 
teaching hospital it is found that only 
patients with cancer of the lung smoke 
more than 40 cigarettes daily. Ooes this 
indicate that smoking more than 40 
cigarettes daily is associated with cancer 
of the lung 7 

Yes Pedantically 
among the patients in 
your teaching hospital only 

Of course, this finding is nevertheless 
interesting, but only as a pointer to 
further research. The data on doctors 

In Country X teSh you. 

about doctors in neighbouring countries. 

Nothing 

What is a sample 7 

Part of a defined population. 

What arc X and s*' the symbols lor f 

Mean and variance 

In fact X and s* arc the symbols for the 
mean and variance of the sample Guisr, 
what p and rr are tha symbol for. 

The mean ;md variance 
of the population. 












— 
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10.12 /J and O 1 are called mu and sigma squared 

0 I* pronounced...and Sigma 

represent* what? Standard deviation ol the 

population 


10.13 0 is small sigma. Capital sigma is drawn 

.. and means .. 1 Add together 


10. M A parameter is a constant used in 
describing a population. 

O and V ar* examples of parameters and Parameters refer to the 

X and s am example ot statistics population and statistics to 

What I* the diffa eoce between parimeters ssmiult:-. from the population 
and statistics 7 


10 15 Statistics Parameters are used to inter 

about Statistics'Parameter. 


Statistics 

Paiiimctcrs. 


10 16 fcach population has one'many valuelsi One 

of p and one,'‘many vgiue(l) o< X Many. 


10 17 This is the freouency distribution of 

what? A population. 





10 ’ 3 t you had not enumerated a population 
/«u could est-maie a by calculating s 
using which formula? 
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10 19 Statutes refer to samples. 5 and 5 

Parameters refer to populations 

How can you remember this? P and P 


10.20 The siie of the samp c to bt used is 
ideally a statistical consideration but 
is often limited m terrris of 

1 — and c — Time and Cost 


10.21 For each population there is one 

value for each parameter, but for wich 
population there are many possible 
wimples each with their own 
• isMt * estimating this parameter, 

i e lor every H !•>»< -ire many possible 
-.assessmq it 


Statistir 

Each value of It may dilter 
slightly but they .ill should 
approximate to p. 


10 22 For every H there are many possible Xs 
assessing it! 



10.23 Good samples produce reliable . Statistics 

wnile bad samples don L 


10.24 The statistics shown in A are-are not 
better than those in B 



Are not. 

They are not such close 
estimates 

Therefore in pi act ice we 
must design oor methods for 
chaining samples so thal as 
in B we could reiy on fairly 
well any possible value of X, 
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10.?8 In frame 10.24 A/B i* morn precise and 
A/B is marir biased. 


litre statistics tin clnv-r 
together) 

A 

(th« average statistic lies 
further from the parameter! 
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10.29 Here A'B »s more precise but 
mofe biased 


A 



6 


B 


^ P 


°*6 
* b 


A 

The statistics lip clow 
together out are not on 
the target 


10.30 Good samples have statistic* which are 

*••»•**•* and.... 


Precis# iJnhi.tverl 


10 31 We wtH discuss bias agum in the next 
chapter. We will conclude this chapter 

by chatting further about precise How closely I he statistics 

statistics. What is precision’ lie u> each othtrr 


10 32 Remember that precision says nothing 
about whether statistics are on target. 

However, an unbiased experiment withou’ 
precision may still easily cause a statistic 
to be an off target estimate. Below is an 
unbiased, non precise experiment II 
your particular estimate is 

♦*.. it would not be very X, X,, 

wor tf while. 



10 33 


The precision of a sample can be 

estimated, 


It equals v - 

o 


Here N is the si*e of the sample and a 


is 


.. 


The sTandatd deviation of 
the population, 


Page 84 of 223 













78 

10 34 Few any particular set of circurmtunoes 
it in the .lhove formula is a constant 

To vary precision... 

must be vorutri 

10 35 To increase precision N must be Increased 

decreased, increased. This is what you would 

•rupeeI the b yger the 
sample the more precise 
the estimate 


Twice v 100 

v - k twice as bi 

rt 

its y^?5 

O 


10 37 To double prcc»*»on fS! must be QuaUiupled. 

(multiplied hy 41 


10 36 A sample ot 100 is us precise 

as one of 25. 



10 38 Research workers often ask a statistician 
how larijc o sample they need to use The 
reply depend*, on the level of precision 
M?q cured and the value o< 

10 39 If you know how precise an estimate 

you need but don't know 0 you can use 
a pilot survey nr, tn be up-to-date, a 
mini-survey, to obta-n an estimate ol O 

You would in fact calculate... 

Irorr the pilot survey sample and 
substitute it in the formula. 

Precision * 

% 


10 40 

What is a sample? 

f*.in ut j defined population 

1041 

Flutn any particular sample you can 

The pwiiametars, ill the 


estimate 'what? 

relevant lully defined 



population only. 


10 4? Give 2 characteristics of n pood 

statistic. It is precise and unbiased. 


s 



Page 85 of 223 












10 43 What « precision? How closely the statistics 

lie together 

t0 44 Precision is estimated usinqwh.it \/N 

formula? a 


10.45 N represents what 7 Trip sample size 


10 46 To increase precision whal musl you do? Take a biggc-» sample 


10 4? A sample which is too small may be 

unbiased but it is too .. Imr>r<*ci« 

to draw vol id conclusions. 


SUMMARY 

It is not enouqh tor you to tie able lo describe numtxrrs you must also be 
able to evaluate thesr worth when virtpics jh« used 

The population «s the entire yrin.p in which you are interested A sample is 
a port.onof ttvat population. The population must be clearly and exhaustively 
defined before a sample is drawn from it I f we an? not sum vvhelhnf a cm rain 
type of patient is included m ihe populatin'' because the population is net 
fully defined, we cannot l>e sure that any conclusion liaseci or the sample 
refers to that particular type of patient Any conclusions based on information 
from the sample only rrfnr to the particular population as dofinrd_ 
ft fmul and a (sK|mal are parameters and refer to the uopiil.it ion X and s aie 
the equivalent statistics in the sample and arc used to estimate the .i.r.imctcrs. 
The only accurate wav ol estimating paiur etuis is a complete populate 
enumrrarion Sampling is cheaper and quiet nr and is oeo .mnally the only 
method u< edimaliun uvailalile The inference about a parameter using 
statistics is always hazardous men with good samples 
One uf the characteristics of a good sample is precision or having statistics 
lying close together Precision is measured by>/N where N is the sample size 
and 0 the standard deviation in the population <? 
a urn be estimated d unknown from a pilot survey lo increase put imum 
the sampte size must be increased To double pun is«on the sample size must 
be increased fourfold. 
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Chapter 11 FAIRNESS IN SAMPLING 

How to be on target 

introduction 

To Lie ttsitlly ‘with it’. vtAt ittict must be unbiased .« well as precivu 


111 .is the term describing Precision 

the closeness ol statistics to each other 


11.2 What is bias ’ 


The term describing now far 
the average statistic ts from 
the parameter. 


Q) 

b) 



x 


x X ? 
x x * 

_ x 

X 


cl 

d) 


* 5? 

K 

M 

X * 

_ x 

X n X 

X * 



Above, 

ibows good precision but is (a) 

biased. 

shows good precision and is |d| 

unbiased 

.~ shows poor precision but is |cl 

unbiased 

.- shows poor precision and *s <hj 

biased 
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11 .4 In the Last frame...represents 

the best state of affairs and. 

represent* the worst, 

<d) 

lb) 

115 How do you improve precisron? 

Increase the sample size* 

11 6 Before we can minimise bias we must 

decide how it arise* Different sort* uf 
bias, like different diseases, require 
different treatments. 


It you wanted to find the average weight 
of adult mates m a town you would; 
would not take .is ymir sample the 
weights of the rugby team? Why 

Would not 

They would be a biased 
sample. 

11.7 The commonest source of bias, as m the 

last frame, *s in selecting the sample. 

The treatment for bias in sampling it 
randomisation , once desci ibed as the 
price of fairpiay ' One kind of random 
sample is the simple rattiktru sample, 
eech member of the population has an 
equal chance of inclusion m this type 
of sample If your population is 
not exhaustively and dearly defined can 
you have a ample random sample? 

No II you do not know 
the constitution of the 
populfttmn the member., 
cannot have an equal chance 
of inclusion 

N.B Thae are uthei kinds 
of random samples but we 
will not discuv* them further 
»n this programme. 

11.8 A sample of G patients with disease X 

are required for a senes of complicated 
tests from a population of 100 patients 
already numbered 00 to 99. 

Slips of paper numbered 00 to 99 are 
mixed well in a hat or sterilizing drum 
and the lirst 6 numbers drawn out are 
the sample Is this a simple'andurn 
sample? 

Yes. All patients had an equal 
chance of being drawn; 
initially the members of the 
population were numbered 
consecutively and the 
required number was drawn 

119 Do the winning lottery t<kets 

constitute a simple random sample? 

Ye*. 

11.10 What K a simple random sample? 

One «n which ew:h memhnr 
of the population has an 
equal chance of selection 
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11.11 Wh.it is an advantage of the simple 
random sample? 


It guards against b«as m 
selecting the sample. 


1112 Tu save time writinq out numbws every 
time wi: w.mt a random sample we can 
use instead published tables of random 
numbers. See the pullout a?>tin Thi-u; 
numbers were oriqinally chosen so that 

they free fioni bias. 

Instead of choosing 6 numbers tnrtween 
00 and 99 ool of the stcrili/inq drum, w* 
could read oM 6 numbers each with 2 
digits from the table of random 
numbers. 

Look at the table of random numbers. 
Start at the lop lei* hand come* and road 
down the first 2 columns Which are the 
6 numbers which would constitute your 
random sample of patients* 


Patients numbered 
06 
34 
34 
4/ 

93 

86 

We will dtSCuM the fat* of 
the unfortunate patient 
No 34 included twice, later 


11 13 Remember that before drawirvg a random 
sample we must define the (iop.ilat.on 
and give each member a «... dumber 


1114 If 660 patients had constituted the 
popjiiiticw wc would need to read 

.columns together rather than 

2 


Otherwise those numbered in 
the hundreds could not be 
included 


11 15 If numbers turned up which ware tugger 
than the number in the popul.ition we 
would ignore them and continue until 
we had hllnd ihe sample spaces wth 
numbers ol the required size Why is it 
better to number a popnlation of 100. 
00 to 99 rather than 1 to 500? 


As it stands you would need 
to use 3 co‘umns and would 
waste many of the numbers. 


11.16 If the same number occurs twice in Ihc 

table you include it twice in your sample, 
(it you are being pedantically correct) 
Can a person appear more mart once in 
the sample? 


Yes. in theory. P.wrent 34 
did. back in Frame 11.12. 
Many people in practice 
would reject il a second tune. 
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1117 It may be a source of bias in itself 

if you regularly uso the tables -md start 
at the same place lor il you ionk at the 
tables while deciding where to start., 1 
How can you avoid this? 


Vary your starting po>nt. 
deciding on it before seeing 
the tables. 


11 18 The numbers are read off consecutively 
from your stHecled starting point. The 
numbers can be read upwards, down¬ 
wards or sideways. When is this decision 

taken 7 Before seeing the tallies. 


1119 What is the variable here? Weight guimxi over a ? months 

|e*T.nd for 1£10 one year old 
children 


Weighr Gained in oz of TOO 7 year old Children in 72 months 


Child's A to 

Com 

Chdds No 

Gotn 

Child's No 

Giowr 

Child’s No 

twin i 

00 

31 

25 

33 

60 

36 

75 

20 

01 

27 

26 

29 

51 

29 

76 

37 

00 

23 

27 

32 

52 

30 

77 

27 

03 

33 

28 

37 

53 

35 

78 

31 

Of 

30 

29 

34 

54 

28 

79 

21 

Oft 

30 

30 

30 

55 

41 

80 

33 

06 

38 

31 

34 

66 

32 

81 

36 

07 

33 

32 

28 

57 

30 

82 

31 

00 

25 

33 

79 

58 

29 

83 

36 

09 

34 

34 

35 

59 

27 

84 

37 

10 

26 

35 

33 

60 

32 

86 

36 

11 

32 

36 

28 

61 

38 

86 

34 

12 

22 

37 

19 

82 

34 

87 

n 

13 

28 

38 

32 

63 

22 

88 

30 

14 

17 

39 

39 

64 

35 

89 

26 

15 

36 

40 

2B 

65 

26 

90 

31 

16 

31 

41 

23 

66 

24 

91 

33 

17 

74 

42 

33 

67 

29 

92 

31 

IB 

23 

43 

30 

68 

27 

93 

26 

19 

25 

44 

29 

90 

3ft 

94 

34 

20 

28 

45 

2 ft 

70 

24 

95 

32 

21 

26 

46 

23 

71 

29 

96 

27 

22 

40 

47 

22 

72 

27 

9/ 

31 

23 

33 

49 

44 

73 

27 

96 

20 

24 

29 

49 

31 

74 

-10 

99 

30 
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11 20 


11.21 


112? 


11.23 


11.24 




In the Iasi trame the children are 

numbered 00 to 99, not**, rather than 

01 tn 100 How many columns do you 

rw**irt to read together to draw a random 

sample using the table of random numbers? 7 


Wr want a random sample of 10 ol these 

88 

99 

40 

childttin. 1 have decided to stmt in 

36 

36 

47 

columns 7 and 8. row 11 «n the table ot 

50 

48 

33 

random number* reading sideways to the 

06 



right first What is my aampifi ot numbers? 




!vee the pull-out) 




l.mw* ■« 0444UU OMHim 

**»4*»*«»i »••>•••••• 

36 *s included the 



veoorKl time. 







Therefore, what is my sample of weight 
gams using these random numbers to 
choose the 10 children from Frame 
11.19? 


• I—I 




««l«*l»> • .■»♦> »<>«« 




In my sample X 30.5 and s - 6.8 
(compared w«th tne population va uos p 
30 and 0-6). 

Do you thmk this is reasonably accurate? 


30 ot. 

30 07 

28 07 

28 o i. 

28 ot 

22 or 

36 ot 

44 OZ. 

29 oz 

30 at. 




I think so. considering 
the sample « fairly small 



Now. guided by the scheme below, choose 
your own sample random sample ol 
10 children 

Column No. 

Row No. 

Direction 

10 random numbers (from the puli-out I - 


It is very un ikely that 
your owl simple random 
sample will be identical 
to mine or anyone else * 
in your group All values of 
X and s should approximate 
to 30 and 6, '.trough, 
do yours? 


■ Mitimt *»*4 


>» » < h 4 i .M««« t 


• IISim44s«4 


Miiin«> u*t»i 




con Id. o/t opposite page 
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11.24 contd 

10 weight gams (Frame 11 19 ) - 


11 2G 


11 27 


4M (••>»••••••.« t»Uttl*4««at 


••••«••» 4 « 4 »* »•<•>))•><<•• 




Therefore your value ot X = 
and your value of s - 


11.25 You may be thinking that the simple 
random sample is a lot of bother You 
have a good point. Why is it used? 


These are the steps in drawing a simple 
random sample, pot them in order 


The reason for the random sample is not 
so much that «l it unbiased but that it 
allows us to estimate the deg-t-e nf hies 
we can expect. There is no short-cut 
to unbiased sampling Some people think 
'haphazard' ot wiUyiuUy' samples are 
synonymous with the random samples. 
To use this method you can go throoqh 
the population and choose the ones 
which you feH like choosing. Is this an 
acceptable method? 


It »s one of the samples which 
protects from b*ars in sampling. 
In fact a simple random 
sample can occasionally be 
off target but we can calculate 
the chance*ol this happening 
it makes chance work for us 
rather than against uv 


(a) 

Read off the sample of random 
numbers. 

<b) 

(b> 

Aliot a number to each member of 
the population. 

<c) 

<c> 

Oecide where to start in the table 
of random numbers. 

lei 

Id) 

Refer the random sample ol 
numbers to the population and 
read off the corresponding results 

la| 

<e) 

Decide in which direction to read 
the table of random numbers. 

trtl 


No. even though people 
arc trying to be fair it is 
surprising how blits creeps 
In when people use haphazard 
sampling methods 


Page 92 of 223 



















85 


11.24 contd 

10 weight gams (Frame 11 19 ) - 


11 2G 


11 27 


4M (••>»••••••.« t»Uttl*4««at 


••••«••» 4 « 4 »* »•<•>))•><<•• 




Therefore your value ot X = 
and your value of s - 


11.25 You may be thinking that the simple 
random sample is a lot of bother You 
have a good point. Why is it used? 


These are the steps in drawing a simple 
random sample, pot them in order 


The reason for the random sample is not 
so much that «l it unbiased but that it 
allows us to estimate the deg-t-e nf hies 
we can expect. There is no short-cut 
to unbiased sampling Some people think 
'haphazard' ot wiUyiuUy' samples are 
synonymous with the random samples. 
To use this method you can go throoqh 
the population and choose the ones 
which you feH like choosing. Is this an 
acceptable method? 


It »s one of the samples which 
protects from b*ars in sampling. 
In fact a simple random 
sample can occasionally be 
off target but we can calculate 
the chance*ol this happening 
it makes chance work for us 
rather than against uv 


(a) 

Read off the sample of random 
numbers. 

<b) 

(b> 

Aliot a number to each member of 
the population. 

<c) 

<c> 

Oecide where to start in the table 
of random numbers. 

lei 

Id) 

Refer the random sample ol 
numbers to the population and 
read off the corresponding results 

la| 

<e) 

Decide in which direction to read 
the table of random numbers. 

trtl 


No. even though people 
arc trying to be fair it is 
surprising how blits creeps 
In when people use haphazard 
sampling methods 
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11 28 0* what value is a 'wdlynilly wimpti*’ Very limited. 


11 29 A random sample is often beyond the 
reach of many practising doctor*. A 
biopsy specimen and it syrmgrfiil of Wood 
are not random vimpfe* but they are 
nev*rth«to*» uttful 

A part*cular doctor's patients ve not a 
random simple ut the local poou latino 
hom which they are drawn 
What should a doctor do it ne discovers 
an interesting fact about them’ 

(1 > Refuse to write it up in the journals 
because it iso t a random sample 
i2> Write it up and call it a random 
sample. 

(31 Write it up and point out it « a 
non random sample 


11 30 It you read an article about an extaeiment 
in which no mention is made ut whether 
it is in tact a random sample What should 
you assume 5 


1131 Which is better - a non random sample 
labelled os such or an undefined sample’ 


'• 1 32 Give 2 cfwnrtmstlcs of a good sample 


11 33 Bias m sampling can be controlled How’ 


11 34 What is c vmplc random sample? 






13) Then his work can be 
checked and followed up by 
others with more time or 
facilities 


That it is not .i random 
sample It the author had 
gone to all the trouble of 
drawing a random sample 
ho would have v»«d so 

M with considerable tout* 


A non random sample 
At least you know where i 
are 


• t is precise and unbiased 


By taking, tor e*ample, 0 
simple random sample 


One in which all members 
of the population arc 
numbered and have an equal 
chince of selection 
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11 35 Lilt the steps in taking a simple randpm 

sample 


Allot numbers 

Choose a startinq point in the 
tables 

Choose the direction for reading 
the tables 

Read oil the samp'e numbers. 
Read off the results. 


11.36 Experiments almost invariably need a 
control sample as a yardstick against 
which to measure the evidence A control 
group i* one identical tu the experimental 
sample in all respects except rhe factor 
under consideration. To be unbiased 
the control sample is selected... Randomly 


11 37 X rays of adult African mates w>th 

a particular disease are being investigated 
What b the control group 7 


X rays of adult African 
males without the disease 
chosen at random. Sometimes 
in loutnals people omit a 
control group or use one 
which is in fact wronq in 
that particular experimental 
situation It the rx|j*fvmi*ntal 
group is hospitalised it is 
wrong to compare it with 
people outside unless 
hospitalisation is the factor 
under review. 


11.38 Experiments almost invariably need 
some control 1 
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1 1.39 Even a properly controller! experiment 

can have biased results, especially if these 
levutts mu subjective (based on opinion 
nr what a p?rson says) rather than 
objective I bawd on facts or what is 
measured! Arc thr following subjective 
or ob|«cbv«? 


la) 

Haernodohiri levels. 

(«) 

Obiective 

lb! 

Patients response to a pain killing 




drug 

lb) 

Sutgect ive. 

Icl 

Birth werghts 

lc) 

Objective 

Idi 

Niumhnr of cigaretien smoked daily. 

id) 

Subjective 




you count the stubs. 


11 40 Subjective results can be very biased 

Ohen o potmni m a drug trial will sense 
what the doctor wood like him to say. 

11 « m»qht either deliherntely try »n plcarut 

his doctor or displease him Olten a 

research doctor interprets subjective 

results subconsciously to fit in with his 

mond or theory Opes randomisation 

guard against this sort of bias’ No. 


1141 These sources of bios tend to be limited 
by using bhnd' or 'double-blind' 
mnthnds A blind' experiment rsone 
where the tmticnt does not know n which 
group he is. for example, whether he is 
receiving the chug or not (the control 
'iroup is often g<ven an nactive tablet 
called d placebo ! A 'double-blind' 
experiment is one where nether the 
patient nor the doctor is aware of thii 
treatment received by the patient. How 
Hons this improve the bias situation? 


The patients in both the 
control and the other group 
wi I tend to be equally 
misleading. The doctor In the 
double blind' situation 
cannot interpret thf‘results 
to hi the purhcular theory 


11 42 How snthild a patient hr allotted to a 
particular treatment? 


At random 
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11 43 

A psychiatrist wants to see whether a new 
drug called Snoze' is effective against 
insomnia His results are to be in 'number 
of hours slept during the first week on the 
drug' 



1 11 How does he decide which patients 

(1) At random amonq his 


receive Snuze'? 

patients suffering from 
insomnia 


(2) Which is the control group? 

(2) Another gtoup selected 
at random from his 
patients suffering bom 
insomnia 


I3J The results are subjective? 
objective? 

(3) Subjective 


141 The control group nced/need not 
be prescribed a placebo 

(4) Need 

11 44 

Read this passage and then answer the 
following questions 

'An experiment is reported in a journal 
of physiology in which 2 different 
dietary regimes, A and B, were compared 
Initially 120 normal and 25 underweight 
children were chosen and weighed 
clothed For a 6 months period the 
underweight children were fed on 
dietary regime A and were given an 
antibiotic daily lh« normal children 
were fed dietary regime B. At the end of 
the trial the mean weight gam of the 



underweight children was greater th.m 

Thev •ire not mentioned as 


the normal children.' 

being random, so presumably 


The samples are/are not random? 

they are not 

11.45 

The correct/incorrect control group has 

Incorrect. Both groups should 


been used? 

either be normal or under 
varighi to begin with Any 
difference could be- due to 
this initial difference. 

11.46 

Why has the doctor given the antibiotic? 

Even if Uvs is ethical it is 
a further mistake as it adds 
another misleading factor. 

The difference coo Id now 



also be doc to the antibiotic. 
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1147 It would/would r»ot tw better to weigh 
the children unclothed? 


Would, because clothes vai y 
in weight nr*d precision 
would therefore 1>e increase! 
1 because o would be 
decreased I 


11 48 The results are sobjoctiw/obiecdve? 


Objective. 


11 49 He therefore need/need not use a blind 
experiment? 


Nerd not 


11 50 Which sample is more precise and why f The 120 normal children, 

because this sample is bigger 


11.51 If you are so naive that you think this 
sort of article isn’t published - good 
luck* One further consideration is 
important when thinking about samples. 
This rs whether the samples are chosen 
before embarking on the research 
lflfospecrive/y\. or whether patients 
already fall into the groups and it is 
only effects tvhinh mt> being compared 
trttrviptclhnSty), Frame 11.45 is an 
example of a prospuctivoVrctrospective 
survey? 


Prospective The samples are 
chosen before the different 
diets were given 


11 52 A doctor h« compared people who have 
had a he.*rt attack with a control group 
to see whether their litt consumption 
has benn funner This is a prospective/ 
retrospective study and what constitutes 
his control group? 


Ret rospec live The heart 
attack had already happened 
before the survey started 
The heart attack patients 
wore already gmupnd Hts 
control group is a random 
sample of peop'e identical 
<ave for rnt heort attack 


11.53 Retrospective studies are often so biased 
thut they have been stigmatised as 
backward in 2 senses'. To makiitfv* 
experiment in the last frame a prospective 
study, you woud do what? 


Observe a group with a high 
tat consumption and a group 
with a low fat consumption 
and wait to contrast the 
rates of heart attacks, 
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Prospective studies are usually bigger. 

For example, it the average heart attack 
rate is 1 per 1,000 people, 100.000 would 
need to be observed prospectively to 
find 100 heart attack cases. They are 
also usually more costly and time 
consuming. Wltat is their advantage? 

They are usually levs biased 
than Them retrospective 
counter parts as the groups 
can be chosen randomly, 
land moreover facilities 
can uvailly be mcluded at 
the same time to investigate 
oiher relevant factors;. 

If you compared the 1 Q.'s of people- with 
bdnarzia with others this is a prospective, 
retrospective study? 

Retrospective This shows 
another disadvantage with 
retrospective studies. If 
people with bitharjig were 
shown to be more stupid, 
you would be unable to say 
whether they had beer, 
stupd and contracted the 
disease or wbethirr fhc 
disease had made them stupid 

Whaf is the conirol group in the last 

frame? 

The I.Q.'iof similar people 
without bilhar? a. 

Give 2 characteristics of a good samole. 

Il is precise ano unbiased 

The size of the sample affects the 
precision and,'and not the bias 

And not 

How can you guard against bias in 
sampling? 

By using a random sample. 

In which of the following situations 
is bias a problem? 


lal Unprecise 

Ibl Objective 

Icl Retrospective 

Id! Non random 

Icl Subjective 

Icl <dj and <ej 

Retrospective 

Non random 

Subjei live 

Mow do you guard against bias wsth a 
subjective experiment if you can't make 
it objective? 

Control Ipianriho.i 

Blind and double blind 
experiments. 
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IT tS2 Practical Example 

A lew knowledgeable colleague than you 
wants to compare ? slimming tablet*. Tell 
him exactly what he must do to produce 
stftrsfactory results 


SUMMARY 

Bias produces unreliable results because the statistics lie away from the 
parameter which they are to estimate - they are off target. 

Randomisation is the insurance against bias arising at the sampling stage. In 
fact the simprr random sample does not eliminate bias but it does alinw u$ to 
estimate the w?e of the bias proh em To choose a simple random sample 
the population members are numbered and tables of random numbers are 
used to read off the numbers of the population members to be included in 
thn wimple The starting point *nd direction for reading the tables of rardom 
numbers arc chosen before the tables ant opened Numbers occurring mure 
than once can be included mure than once but (uglier numbers in the table 
than those used to number the population are ignored 
Subjectiw results {based on opinion rather than facts) are more prone to 
bras than objective results A control group Hvith a placebo in drug trials) and 
blind or double blind experimental designs can be used to diminish the effect 
tit bias with subicctive results The control group must be as like the group 
under investigation e> pusvble save for the variable under consideration. It 
should also oe chosen randomly. 

Retrospective (.backward looking) studies are generally more biased than 
prospective studies although they ore usually smaller, cheaper and quicker 
to perform 
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Adapting the Numbers 







12 1 


122 


12.3 


124 


12.5 


12.6 


12.7 




Chapter 12 


WHAT HAPPENS WHEN WE TAKE SAMPLES 


INTRODUCTION 

In the chapter you all calculated a value X to estimate n for the weight 
gains fLtch wmpU-w,is different and X v.v. * T i • noortant 

bHcausw often von calculate only t value o» a sample mean to estimate a 
population mean. anil van want to know f*ow reliable voui estimate is 
likelv tn h« 


For a hit ol rp.ution 

What K the difference between a 

parameter and a statistic? 


Come la tha* - didine a popnl.i ion ' 


A vjmplc is a 
popular ion 


of thn 


A purarncii< refer* to a 
population and a statistic 
too sample. 


All pi something under 
nvmtn>ition 


Portion part 


How can you choose an unhiasfcrt samp*? Randomly 




The -t/rtrstic X esTimstj—. iho parameter 
Jl- How Ho you matffc X more precise? 


Incrcasa N the site of the 

sample, 


Do von Mn -tembnr .vh.>t a variable' is? 
Why can X be catted a variable? 


tiecitwhe It vurnp from 
sample to sample. 


What siYdpn is the distribution of most 
variables? 


Norms) hell shaoed 
symmetn H with 2 points 
o* inflection. 
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128 The distribution of X is no exception. 
Sketch the distribution of the sample 

mean 


THE DISTRIBUTION OF 
THE SAME! F MEAN 

1? 9 Fch the distribution, in fact, to be 

normal, the vimples must be random 
This also makes the sample? . 


12.10 What is an unbiased sample? 


12.11 Therefore, what is the overage or mean 
value of the distribution of random 
sample means 7 


12.12 Imagine a population from which 

lots of random samples have been taken 
(for example the population of 100 
weight gains in the last chapter). 

'\I/* 

x- M -X 

X 

Thrrwt values of X vary but are all nearly 
equal to .. 



Unbiased. 

This normal distribution is 
another virtue of the random 
samples (Incidentally, if the 
samples are fa ily large, X is 
distributed normally whether 
or not the underlying 
population distribution ►* 
itself normal.) 

Orvo 'Atime the average 
statistic is on tat get 0> 
equals the parameter 

If you got this answer right 
you can go straight to frame 
12 1b Otherwise go to the 
next frame 


u i - . 10 ) 
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12.13 In Frame 11 24 p was 30, your value of Fill in your value X here. 

» ■ ■ ■■ ■ ■ ■ - 

X was _ .... and mine was 30 5 


12J4 Therefore mark your value of X here 



12.15 Drew the distribution of the sample 
mean. X, as fully as possible. 


12.16 What tu»ve you assumed in the lost frame 7 


12 17 Once yn mow tne value of the standard 
delation yi.'u 1 now all yo_y need to know 
ubnut the distribution of X It ts 

V* 

You have rr.ct it already Ithose who 
read this bon- .m their heeds are at an 
udv.mtiii)ij' I Where? 



IThe variable is usually 
shown at the far right end 
of the hurtrontal axi* I 


That the samples arc random, 
and large. 

n/n 

Precision k v — 

0 


It n 


1 

precision 


1? 18 This seems sensible. As the precision 
of X increases you’d expect the 

variation nf X to mcreuse/diminish Dimmish 
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12.19 What h the variance uf the distribution 
of X? 

12.20 What does N represent? 

12 21 If N re multiplied by 4 (i.e. 

YOu quadruple the size of your sample I, 
the standard deviation of the distribution 
of X is.and prec»sion. 



Draw the distribution of X taken from 
large random samples 


12.23 


a 

_ IS so important it is given a 

v* 

special name and a spec lal symbol 
O 

— - the standard error of the means - 

V* 

°x " s x 

if u is not known 

Write the formula for precision using 
this new symbol 


*>) A. C) 

.M 



Above is the distribution of a 

population 

.is the distribution of a 

sample from a population 

the distribution of 
random sample meonv 


<r 

N 

The size of the sample 


Halved, doubled 



~ 1 1 
Precision or 

a- 

X X 


Page 105 of 223 






















98 


12 25 1 he value of 0 for I Q \ of university 

students is 15 Voii each tatte a random — - b 

sampie ol vue 9 What is the value nf 


12.26 What « s- called? The standard error of the 

x mean. 


12.27 What value hod o y foi out distribution 5 

of random sample ol weight ga*n in «/l0 

Frame 11.24? 


12.28 If you did not know o how would you 
estimate 0 - ? 


Use s ur>ieud of <J 


12.29 State a formula for calculating s. 



12 30 What iv this distribution and its standard 
deviation culled and what exactly does 
it represent? 



Distribution of the sample 
mean. 

Standard er«or of the mean 
The distribution of all the 
means ol liitije landom 
samples of s«7« N horn 
a given population. 
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12 31 U you altnr the »«ra of your sample you 

do not Change the value of the mean/ Mean 

standard deviation of the distribution 

of X but the mean /standard deviation Standard deviation. 

of this distribution 


12.3? A random sample has 2 
What are they? 


It is unbiased and the 
value*, of X follow a defined 
distribution, the distribution 
of trie sample mKons 


12.33 Draw the distribution mentioned in the 
test frame 



Often research workers wish to comove two means 
For example, soon we are going to see whether the mean birth weight of 
offspring of diabetic mothers differs significantly from that of nnrm.il 
mothers using the data from Chapter 3. Initially we assume that them rt no 
difference, that the two means come from identical porn ilit tons ot the some 
population To understand this fully later we will now take a Quick Inok at 


THE DISTRIBUTION OF THE DIFFERENCE 
BETWEEN TWO SAMPL E MEANS 


.34 From Its name wtwt do you think the |X X> 

variable is in the abovement oned 

distribution? sav “XjJ 


12 35 You can imagine how the variable 

IX| — X 2 ) is distributed Normally. 


12 36 This is so under what conditions? If the samples are random 

and large. 
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12 37 What vr.js your value of X m Chapter 11, 

again _-7 - 

12.38 Mine was 30 5. What value dot--s (X , X : I I You* value -30 5> 

have in this case’ or 

(30 5 your vrtlue.i 
Two ol the many values of 
(X, Xj) 


12 39 II you vubtracted your value X from 

evetybody rises and everybody else did 
the -aim*. all these values IX j Xj) 
would follow which distribution? 


The distiioution of the 
difference ol two wimple 
means iN s small but n is 
li nmvn so the distr ibuhon 
n normal.) 


12 40 Most value* of X equal or nearly equal 

.. whirh they estimate. P 


1 i 41 Some ore a !*ttb> hit b>gijnr and an equal 
number are a Mile bn smaller than p 
Therefore wlui do yuu tnjnk ihtdvenyr 
value of (X, Xj) equals? 


a 

Some answers are a litlle l»t 
bigger and others a little bit 
smaller but the average 
difference tv 0 


12 42 Draw Ihe sampling distribution of tho 
distribution of the difference between 
two sample rneam. showing the variable 
(X, X j) at the far right hand *ide of 
the hunmnijt axis 



12 43 1 he standard elevation for the 

distribution ot the mean, again, is what' 


0 

V* 
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12 44 The variance in this distribution equals 

what? 


1 ? 46 It is a fact that the variance in the 

distribution of the difference between 
two sample means is 



N, Nj 


This variance in words equab what? 


a 


2 


N 


The sum of the variances 
of the two individual sample 
means. 


1240 i.e. The variance of the distiioution 


- ff* o 3 
< X ' X ’>*N, * H, 

What is thfc standard deviation of this 
distribution? 


t2 47 In our random samples from the weight 
gams »n Frame 11 ?4 all valor*, of N 

were equal to.and o *. 



Don't lose heart this is the 
programme's algebraic 
Sum mi i 


10 6 


12.48 The distribution of IX, X 3 ) lor these 
weight gains had a variance 

£ ^ o* 

N, * H* 

equal to what number? 


12 49 The standard deviation 


A 


N, * N, 
equals what number? 


— + — 

in io 


v *- 


12.50 


Algebraically 
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12.51 Which ihslf ibution is this* 



12.5? How b*q aro the rarr.p «? 


12.53 What kind ol samples? 


12.54 What distribution <s this? 



12 55 What distt .button is this? 



Ihe dkstr ibution of the 
dil fwervce between two 
umpit means. 


N| and Nj 


Random 


A population distribution 


A sample distribution. 
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12 56 Draw the distribution of the sample 

mean, X, when the sample* are random 

and large 


103 



12 57 What is— called and what i*d> symbol 7 

V* 


Tire standard error ol the 
mean 17 or s w 


12 58 iVnat exactly is the distribution ol the 
sample mean? 


The distribution of ah sample 
means of random samples 
ol si/e N drawn trom « 
popular ion 


12 59 Draw the distribution of the 

difference between two sample 
means (K, xp 



12 60 What exactly « this distribution? Thed.st-ihution uf the 

difference between twu sample 
means one taken Pom .i 
random samp e sire N. and 
one from a random sample H 

SUMMARY 

Frames 12 54 to 12 60 serve well as the summary. You need to recognise 
these distributions whan wo use them again later 

Most readers inform me that this is the most difficult chapter in the whole 
book The significance tests used in Chapters 16 and 17 to analyse actual 
research arc based on these two sampling distributions You will oe able to 
perform these tests without fully understandmq this chaptei but of course 
you will be performing them rather in the dark. Should you be unhappy 
about the contents of this Chapter I suggest you read it again now b*lore 
proceeding 

Note To be pedantically correct sample means follow the normal curve il 
the samples are large and random whatever the shapte of the d.stnbur.on 
in the population. In small samples, so long as O is known X also follow the 
normal curve 
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Chapter 13 


THE LAWS OF CHANCE 


introduction 

The lust chapter was fairly difficult However, with this chapter and the 
•olluwmij chapbi we haw j I lh< prop* for applying significance tests to 
results which arc normally distributed, 

This chapter itself rs the twvs of -»// statistical tests 


13 1 Manv people talk about the likelihood, 
chances or odd*, of a particular event 
napp*jnmg We use the ward probability 
What is the probability of tossing 'tads' 
with a com’ 60 t 50 


13 ? Prolxibii.ty is given rhs symbol p' 

Its range is 0 to 1 When p 0 un event 

n impossible What dues P 1 mean? That an event is inevitable. 

13.3 What is p that you will Hie one day? 1 


13 4 S*tale an event (or maybe w* should say 

a happening'in this modern age!) 
wh?r« p * 0. 


e.g Your swimming the 
Atlantic my growing a 

halo, etc 


13,5 Sometimes p ca/ \ be estimated logically. 

The probability of drawing an ace from 
a normal card pack is 1/13. i.e . you 
have 1 chanco out of 13 Other times 
you can estimate p from thy equation 

total number at 

occurrences of the event i time 

p • - -— 

total number of tnals 200 times 

If a surgeon transplanted ?O0 hearts and j 

1 person survived the probability of - - 

survival here is what 7 200 


13.6 This means p is equivalent to 
the number of sheep 
total 

which is un example of a .„. Proportion. 
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13 7 

The probability, P, equals. that 

1 


a coin will fai‘ 'heads' and equals 

? 


. that a ‘2* will be thrown with 

1 

6 


one of a let of dice (a die m fact 1 } 

13 R 

If one event precludes the possibility of 



other specified event* the events are 

Are each of the three 


called mutually exclusive Surviving an 

possibilities excludes the 


operation, refusing an operation and 

other two. Fall nq into one 


succumbing during the operation are/ 

of the groups excludes you 


a»e/are not mutually exclusive events 

trom the others 

139 

Tossing a heart or ,-t tail with one throw 

An* If the word nr is 


of a coin are/ivc not mutually exclusive 

used it in fen mutually 


evcnls 

exclusive events 

13 10 

Tossing a hcod with one com and a tail 
with another coin are/are not mutually 



exclusive. 

Are not - you can do both 

13.11 

So long as events are mutually exclusive 
thv Adtbttvn Law of Prvtfebtbty stale;, 
this: 

The probability that on event will occur 
in one of several possible ways is the sum 
of the individual pruoabhties of these 
separate events 

1 1 1 

— ♦ - - 


Therefore, wh.it is the probability of 


throwing a '6' or a '2‘ with a particular 
one of a set of dice ? 

6 6 3 

13 1? 

For the addition lass to apply, the word 
or is seen or implied. Remember the 
event* must be mutually exclusive. What 

1 ♦ 1 2 


is the probability of drawing on ace nr a 


king with one cut from a pack uf caids? 

13 13 13 

13 13 

Is the probability ul ilr uwmy an ace and 

No - they aienoi mutually 


a king with two cuts from the pack 

exclusive events you have 


2 

two draws arw can rtu ixilh 


13 } 

The addilKin law doe* not 
apply 
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13 14 What is the probability ol tossinga 
head or a tail with one throw? 


1 


1 1 

4 

2 2 
i.e , it is inevitable 


13, 15 When all possible outcomes of mutually 
exclusive events arc given, their 

probabilities sum to one The probability Anyth og bul a 3 ie 

of throwing a '3‘ with one the ♦ the a 1 or a 2 ck a 4 or a 5 ora 6 

probabilities of throwing what? = 1 with that throw 


13 16 Thi- probability ol being bloud group 

Rhesus »vs equals 1 m.nus which 
probability? 


The (wuh.ih.htv of not being 
Rhesus »ve ft.®.. Rhesus ve 
its mutually exclusive 

event 


3 17 The probability of being Rhesus -ve 


1 



What it the probability of being , ' " 

Rhesus +ve 10 10 


13 18 What are mutually exclusive truth? 


Those where doing one 
precludes any others 


13.19 What dons the Addition Law of 
Probability slate? 


That wdh mutually exclusive 
events, lu find the probability 
of one or another happening 
the individual probata!me* 
are added 


13 20 Whet is ihe sum of the probabilities 
of a group of all possible mutually 
exclusive events? 1 


13 21 The probability of a pregnancy 
resulting in a multiple birth is 
1 

80 

What is the prcvoatitliiy ol a vn-iln birth? 1 1 

80 80 
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1 3.22 The addition law on its own ts of linuted 
use. The probability of a single birth 

79 9 

is — and of being rhesus +ve is 

80 10 

We cannot us* the addition law to 
determine p for a single rhesus «ve birth 

Why? 


Because these event % can 
omit rngethet They are not 
mutually exclusive. 


13 23 The Mu/ ttphcaUon Low vf Probofnh(y 
.ippl es to two or more events which oo 
not affect each other (i.c. ore independent! 

occurring together. Does »t apply to No. these cannot occur 

mutually exclusive events 7 together 




13 24 What are independent events? 


Those which do not affect 
each other 


13 25 Rhesus blood grouoing and multiplicity 
of births are independent everts. The 
probabilities of a Rhesus -i-ve birth is 

9 79 
— and a single birth is 

10 80 

What, theiefore, is the probability of a 
pr*grv»nry resulting in a single Rhesus 
+ve birth? 


9 79 711 

K) * 80 ’ 800 

(The multiplication law 
applies to independent 
events) 


13 26 If the probability of a female birth is 

what «s the probability of a fnmale 2 

Rhesus tvs birth 7 x _ 

(These are also independent events) 2 10 20 


13.27 The probability of a birth which is 
female single and rhesus ►vi* nquals 

* »• ••» MU4* .*|>. .i>Mi«»H«»»rimitHM»l IlMlnii 

(Htnt: that Multiplication Law applies 
to 2 ot moie events) 


1 79 9 ^ 711 

2 * 80 * 10 * 1600 
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3 2S 


nvolvcd mt' avvociatud (not 
)t *ach otr.w,|. Hi«> 
i law cannot lie applied Th‘ 
bnnij calourtitind tv 

rva itmjlu it - 


iy of iKiint born V'tnali 
I 


rJi, Colour lilurijnfnt K 
||'I1«;< .(1 Iy u^iCCUited with Se«, 
roloul lilunt pimple ,tie 
" flu* r uitipi •r.vtion law 
'lot-, nnr apply to two u<ch 
■ ri'jU'. «atr?il *vj cnt« 


6 6 

(Addition l-JW ruuiru 


!0 f-or Him mulriplirjlion law ro upp 
tn» word and r. utard to connect t 




1 I t 

0 6 36 

ff«* ifti|j|»tijtuin tjwl 


^ ill** prtibibitity til thro 

t :|t it ihwi „ o> 


1 

if 


Tr 


Mini.* 





ix.llab'iltv Ot 
it 2 r/t ,• ?. ar-d 


irv | * A 


36 36 18 

(Addition L.uvl 


13 


niotr ion wnw idiws tn th« mu of twt 

lllljo’ 

in r. ttir proliwhility ot 

•• rnnli? and dt**t a mute * 

.i main and thnn a 
a Inmate and then a mate? 


I 

t.il 4 
I 

(b) 4 

1 

fcl 4 

1 

fill 4 

(TImm ite mot./.ally 
i>Ri:ltrtiv*' and do in fact 
urn to 11 
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13 36 Therefore what is the probability of 
two siolmgs bcmq one mile and one 
female fin either sequencel 7 


1 1 1 

4 4 i 

male female 

then then 

female male 


13 36 Complete this table for two offspring 
5«?i/w«ce F»trnty ProtMuUtv 



t 

MM 

2 mala 4 


x ! "| 

f M ] 

1 male 4i<0 1 a 

i 

) P 1 ? ' 

M F 

J 

1 female I I 

* * 

2 females 


13 37 Notice that the probability of one 

female and one male is twice that for 
two males. This is because two males 
can only arise in one sequence but one 

of each sex can »ise in . 


Two 

If events can occur in more 
than one sequence the overall 
probability is the sum of 
the probabilities for »:«* :f» 
sequence 


1338 7 males in a family only arise in 

one sequence: male and male and 
male etc. What is the probability of 
7 offspring being all male? 


13 39 6 males and 1 female can ar se 

in..sequences 


I 1 I 

i x _ X i 

? 7 7 


7. 



M 

M 

M 

M 

M 

M 

F 

Or 

M 

M 

M 

M 

M 

F 

M 

or 

M 

M 

M 

M 

F 

M 

M 

or 

M 

M 

M 

F 

M 

M 

M 

or 

M 

M 

F 

M 

M 

M 

M 

or 

M 

F 

M 

M 

M 

M 

M 

or 

F 

M 

M 

M 

M 

M 

M 



Each of Them has a probability 

so that the overall probability 
is what 7 


t 



1 

2 

I 

? 


. 7 

<^> ♦ etc 

7 

128 
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13 40 Thr number of sequences, tri which 

2 Irraales can arise U 21 Thu probability 
for 2 females ami 5 males in any order '* 
21 

128 

Similarly the probability of 3 females - 
35 

—- and so on 
128 

These probabilities are represented in 
the diagram. 



All These alternatives are mutually 
exclusive 

What is the total probability? 1 


13 A 1 Probability car. be represented by «/e 
o-* area so long as the total possible area 
erjuali .. and The area (!• awn to scale 1 


13 42 Wnot is the prubabiMy of 7 children 1 7 7 1 

corvMingof Oor 1 ferna ts or 0 or 1 128 128 *128 1?B 

males. 

I Thu snaded ar*s m Frame 13.40) j 

iTs " 8 
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What is the probability of hawing a 
value less than 20 in this histogram? 
(Also the shaded area) 


13 44 What is the probability of a result 
greater than X in this sample 
distribution? 



vVnat is the piobability of a result 
failing bolosv X or above <X + sj? 



1 

io 

(the area >s I 10 of the whole) 


1 

2 


1 1 
2 * 6 


? 

3 


What dors probability mean? 

If the cnarces are b<J 50 

P^ .►. 


Likelihood, chances ur 
odds. 


1 

2 
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f 3 , 4 B Givt tni- lunnole for rs^iiriutinfl o 


With mutually exclusive 
events wlvn Wiintinq the 
(Voti;m My of on? event 
Or .irwithr* 

13 f>n vVhen «v<H Id you n u tiply pf • • iLmI Ti*s 7 When the probability of two 

of mon- events iH-LUtreiy 
inqfcthef is requited and 
they are not associated 

13 hi iVhnn nventscan occur in morn ihun one 

ifiiunnct 111? overall probability is the , 

bum of the iiriKianilitdn 

lot tlv* eitiviilu.fl .cimrncet. 


13.49 When wouxt ycu add piotwbitiri« 
together J 





Tota 1 numhi'f of occurrence 
Total number of trials 


AR 


’robobiUtv mo 


th.it tie 


d tl 


yli-i.l bond, f antes or odds. It has the symbol p’ and 
i possibility! lo 1 imitvltobiMyl It is irst mated from the 

total number of iscciuttences of the event 
total iimliei of trials 

nf Praii ros'it> ippi i t.i mu'-u.;lly exclusive events which 
t t e . ;irr ence uf tine eve a excludes the possibility of 
il.ice 1 he prntTii* . Iity of one u/ mote ni.itu.illy exclusive 
• i ih • Mdlis irtu i r (irubab lit « All the probabilit ts uf 

ii/ uiplaccsn.' vw use the Muttipticattof' law of Probait>hiy 
I 1 '! . it I 7'* ii- mu r i .rifil'. isrrur ' ir*-; together [e.c| TvitiI 
• ' i "xlsi'.t -_il tht i i dividual probabilities. This is uotv 
ii f»tie c .ii*. 1 nut iissociiitctl *n any way. t v if they are 


-Cut in 


Vi hem the total atis« is 
probability 


hi.* than one sequence live overall probaoihty is 
lur 'hr imhv d ..il Sect'Mtr»c«s. 
i it proportional area can f» used to represent 
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Chapter 14 


STANDARDISING THE NORMAL CURVE 


INTRODUCTION 

In this chapter we learn to apply the ideas about probability to the normal 
distribution. This is the last problem before going on to uvng numbers to 
answer questions 


14 1 What is the most widely occurring 
frequency distribution in medicine' 


14.2 You arc to learn about a new character 
ttji-c feature of the normal curve What 
characteristics do you already know? 


14 3 The new charar truistic is about 

probability, p. Is probability, a ratio, 
a proportion or .i rate? 


The normal 


It is symmetrical and twll 
shaped with two points of 
inflect »on 


A proportion. 

Number pi occurrences 
of an event 

Total number of trials 


14.4 A percentage is 100 tunes a relio'rale/ 
proportion? 


Proportion 


14 5 What is the percentage area beyond 

(X * si here? 16% 



14 6 What is t*e proportion of area beyond 

IX ♦ shn the last frame? 0 16 
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14.7 Thcrufore, what ii I'm probability ot it 
result being bigge* than (X • *1 in that 
normal curve 7 


14 8 These probabilities and % areas apply 
to all normal curves 
What is the probaoihty of obtaining 
a result lower than two standard 
deviations below the mear 
(re. below ifc -2s* *ii Frame 14 51 


0 16 



(• 0.025 I2*A%> 

1 he same ,H that alvoui; 
(X ♦ ?sl 


14.9 Here is a curve representng I Q V 

What ts the probability uf having an I.Q 
above 130 7 

(Refer to Frame 14 5 I you need to| 


130 - (X t 2si The 
probability equals that 
dhow <X * 2sI 
P 0.025 



100 


14 10 This curve represents haemoglobin 
What is the probability of having a 

haemoglobin level abuv.- 110 7 0025, the same 



100 


14 11 You areasuiliikdv to have art I.Q u l 
130 u» bigger «s lo have a hu-inog obm 
tevtM ot 110 or above. Why fe thl 


Bei.aiiM- hnth vaLis lie 2 
standard deviations beyond 
the mean on a normal curve. 
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14 13 


14 14 


14 15 


14 16 


nb 


14 12 


Here the probability beyond ?!> on 

Graph B is the same os beyond. 

on Graph A 


1 



On Graph A in the last frame tht value 
2 hes .... standard deviation (s) above 

the mean, and - 1 lies.standard 

deviation^} below the mean 


In fact in Graph A in Frame 14.12 the 
numerical value itself tells you the 
number of Standard deviations yon 
are from the mean 
-3 on this normal curve is where? 


Because Graph A in Frame 14 12 is easy 
to um* It is given ;» special name - 
the standard norma/ curve 

It has a mean value. 

It has a standard deviation_ 

and variance. 


Below is a drawing of the standard 
normal curve. The variable is given 
the symbol. arid represents what? 


Z 

1 . 


3 standard deviations 
below the mean 


0 

1 . 

1. 



The number of standard 
deviations from the mean 
on a normal curve 
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14 1 1 W»*it ti the probability of « 2 value 
less than 1 on the standard normal 
curve? 0.18 

iThr diagram from Frame 14.5 is 
it-pealed rvera after modification) 



14 18 A / value ol 1 on the standard normal 
curve is the same m the value 

< .. . .) on all noimill cutv*», IX - si. 


14 19 Conversely the result 8' j on normal 
OurveQ m Frame 14,12 is equivalent 
to.on the standard normal curve. »3. 

It is 3 standard deviations 
above the mean 


14.20 To recap, the / value on the standard 
normal curves equals ... 




The number of standard 
deviations above or below 
the mean. 


on all normal curves 


14 21 With the result /0 here, i-..... ? 
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14.22 By calculating t as in the last frame 
every value on any normal curve can 

be related to the .. normal 

curve. 


14 2'J The z value equals the ruimbtr ol 
sta^-dard deviations from the 
..... on any normal curve 

This equals the 

distance hom the mean 
the standard deviation 

the particular result - the mear. 


t.e. 


? 


117 


Standard 


Mean 


The standard deviation 


14 24 Using this formula, the value 4 on the 4-10 

normal curve with mean 10 and standard q 

deviation 6 is equivalent to a / value 

ol what t 


4 25 Thu meant it tret .standard 1 

deviations! the mean below . 


426 Tht-relore the probability ot obtaining 
a result equal to or tower than 4 cm » 
normal curve with mean 10 and 
standard deviation 6 is what? 0 16 



4.27 Give the formula far calculating 2 
1 - 


the result - the mean 
the standard deviation 


Page 125 of 223 






















118 


14 78 The formula in the Uisi frame for 7 
caoi'caonor be used lor all normal 
distributions. 

1479 Th<\ diagram relates i to .. 



14 30 Therefore any value on any normal 

cutveca- be equaled to .by fir it 

calr.,1 it «nq 


14 31 Uuittlly, in uvng the '••O'rnal curve to 
answar question* wc arc interested, al 
Ihi* wme time, in values higgif than +z 
or smaller than *. 

In F»in»e 14.29 the probability p of a 
tmnlt bigger than / »2 or smaller 

than I * *7 It >it n 


14.32 Instead o+ using the M&ndard normal 

curve itself we can construct tables of i 
and the equivalent p values. 

Uwng the diaqtam in frame 14.29 and 
the remit from the lust frome we have 

P 0 32 ? 

i. 1 2i 


14 33 Do yuu understand where the 0.32 cam# 
from in the last frame where 7 - 1 ? 



? 


005 


Yfii. I blip** 16% ol revjlti 
are beyond 7 ♦ 1 and 16% 

below 7 - -(giving a 
of 32%, ».£» p 0 32. 

I The / value and p value 
refer to both ends ot lire 
normal curve together .) 
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4 34 Useful values tor z (N.B. plus or minus! 
and the equivalent p values Jartr given 

below: 


p 0 10 0 06 0 0? OOt 

f ve 20 7 3 26 

The probability of a result 

E*plain again what z = 2.0, p = 0.05 

means. 

bigger man z = +2 or smaller 
lhanz 2 »s 0.05 


4 ?6 t is thr variable on which curve’ 

The standard normal curve. 


it36 7 rftpfPWintx t< 

The number of standard 
deviations from the mean 

and may be calculated using the formula 

l 

the result - the mean 

the standard deviation 


1 37 Use it, and the table in Frame 14.34 
which has been transferred to the pull¬ 
out at the back of th9 book, to answer 
the following question. 

What is the probability Cit a result bigyet 
than 22.9 or smaller than 9 1 in a 
normal distribution of mean 16 and 
standard deviation 3 > 

22 9 16 9.1 - 16 

i - -— or 

3 3 

- •♦2.3 or -2.3 

Fi om the table, p is 0.02 


38 In the table, p Includes both end* of the 
curve If you had only bevn interested 
in the last frame in the orobability of 
a result bigger than 22.9 you would have 

.the p value given in ihe 

table 

1 *alv«d 
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14 39 Complete the diagram below U> indicate 
from the t.ihir r - 7.3 where p = 0.02 



14.40 Ordw the distribution of the sample 
mean* (F - rom m4?mo^v» I hope) 


14 41 For any particular result X Irom this 
normal curve. 

IvunQ the fijim.il.il 


14.4? Assume that psychology students have 
, an cverage I.Q, of 1?0 with a standard 
deviation 15 Yuu per form I.Q. tests on 
a random sample of 36 such students 
anil calculate X to be 125 What is r's 
value end the probability of suits an X 
value as 125 ut lugger turning up? 

Y 

0 

N - 


p (from the tables) 

hut. p - (here! 



—r-i i i i—I J 

232 1 01 223 


Total shaded area 
= 0 02 



Ihv result the mean 
the standard deviation 

- *_ H 

o 

V'N 

Use this in the nrnt frame 


X - 125 

M 120 

a * 15 

(SI 36 

7 - 125 120 

15 



The equivalent value of 
p 0.05 horn the tables 
We only want the 'or 
tugger ‘ end and therefore 
thu r-i|,i<MHl probability 
of Do C2b 
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14 43 95% of the results lie between.. (X « 2sl 

and...here. <X - 2s I 





14.44 Look at the distribution of the sample 
moan below and compare it with the 

last frame. 



95% of sample means lie between 

and.. 



14 45 We can be 95% certain that a vaiuc ol X 
is within ....of ft. 


2o 


14.46 Sometimes /j is unknown and we rr-quire 
ta estimate it using X. We are 95% 
confident that our particular X is 

2o 

within _._... of b- — 

V* 
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/ 2o\ f a 7a \ 

14 47 X * Und X - _ 

V. V'N \ n/N ) 

rft 1 called the 05% cotiiirfwcif i>ttervals 
for estnuitinq ft If O it unknown the 
35% confidence intervals for (J rru iy be 

written .anil . 



2 s\ 2s \ 

X ♦ ‘‘|ar.d X 

V M v’n / 


Provided ol course the 
sjmplr »•. random and m 
fact the sample is fairly 
*4ir yt; 


14 48 We wish to iHlimate p foi 1 Q o< all 

university students. We lake a random 
samplr ol I0G students and find l'n' 
mean result In this sample to be 115 
with standard dwuitioti 10 We ii e 95% 
confidmr that p ik between 

X * ~ and X J-L 

V /N yTff 

Which m|H.»k . and.here. 


2x10 

lib* cud 

i/l00 


11f 


2 x 10 


/100 


117 and 113. 


14.49 The distribution of the samole mean 
is only distributed normally d the 

samples die,,..... , Raodpafl 




14 50 The 95% confidence intervals can only 
be used to estimate p it Ihe ump t is 

and.... 


Ratwlmnand lairiy <iiue. 
fSay nunc than 30 or 
u known). 


14.51 ....... and 

are the 95% confidante intervals Tar p if 
•.he sarnpli' u nindom. 




14 52 When are 95% Lonhdeitci ntervoh 
used 7 


V*'b»rr .vi requite to tstimafl 
.i paiameter t»-.<j p* frum i 
slat stic i X i 
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453 We calculate the mean birth-weight of a 
random sample of 36 children of diabetic 
mothers to be 110 o/. with a ttanrfarrl 
deviation of 30. What can we say about 
the mean birth weight o< all Such battle*? 





Complete 09 am the distribution of 
the difference of 2 sample means 



We are 95% confident it lies 
60 

tietween 110 ♦— and 
6 



= 120 oz and 100 0 / 



14 55 You have a particular value of (X 1 Xjl 

Which formula would you use to 
calculate the equivalent value of z? 


456 


X it 

z = - and i 

a 


v* 


X, Xj 



1 


Will be used again soon. 

You do not noed to remember them, 
although I hope you could derive them 
for yourself. Could you? 


the result the mean 

x -- 

the standard deviation 
IX, X : i - 0 


rr 

1 

s/ N, 


X, 


h 

1 


"2 


? Yes 


Sketch the standard normal curve 
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4.50 / - the number of what 


Standard deviations burn 

mean. 


' 4.59 What Irs the importance of the standard 
normal curved 


The / value can be derived t 
all results from normal cur* 


4.60 Using what formula for z? 


the result ihe mean 
the standard deviation 


14.61 Sketch the value of p used in table* 
relating p to z, where t wiuels 1.6. 



14.6? If you are interested in only une end 

of thr curve horv can you use the tabled 


j the recorded value of p. 


SUMMARY 

All normal curve* can be adjusted so that the probability of obtaining 
certain results or biiyier can he calculated They can bo adfustsd to the 
standard normal curve which has mun cojoI to 0 and standard deviation 1. 
The result of the standard nnrm.il curve /, er|uaH the number of standard 
deviation* .i result on any other normal curve lias from the mean, z can be 
calculated using the formula: - 

the resu it — the mean 

the standard deviation 


Therefote for Ph; distribution of the sample mean 
X -u 


V* 

and the distribution of the difference between two sample means 
X, Xj 

n—r 


V N i f 
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Tab**?-, relating p to z are readily available. Tne p tabulated value is the 
probability of obtaining a result bigger than -*-2 or Ins than 2 added 
together If you only want the probability at one end of the curve the 
tabulated p value H halved. 

A confidence interval con be used to predict 0 likely range of values tor a 
parameter using statistics. The most commonly used confidence intervals 
are to predict pt from X. The 95% limits are in large random samples. 



By substituting results from a random sample in these formulae we have 
a range within which we are 95% confident y falls on condition that the 
sample is fairly large (say bigger than 30 I 

Well done From now on we will use only the ideas which we have already 

met 
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Chapter 15 

IDEAS BEHIND SIGNIFICANCE TESTS 

Firstly, a short story, which is partly true. 

A physician and n surgeon traditionally went to ptay goll on thursday 
afto*noons (time and wcalher permitting) At the Nineteenth hole it was 
decided that they should each toss a coin Should both toss heads' or both 
toss "tails' they would re-toss their coins unt'l the unfortunate threw a head 
{ax) bought thr drmksl and the other threw a very profitable 'tail 
On the lust three occasions the phyvoan tossed 'tails’ and the surgeon 
’heads'. Very willingly drd the surgeon, although a Scot buy the 
beverages On the fourth successive occasion the surgeon emptied his 
pockets rather less willingly. 

Returning home after the fifth successive Ttiursday s expenditure, the surgeon 
muttered to his wife Och’ (for hr was Scottish! I m sure the physfcuut ts 
above txoarp put I do think there is something rather uncanny m his ‘tail 
tossing' ability'. However. the surgeon's wife was able to re-essure her husband; 
'll is obviously just bad luck on ytxir part due entirely to chance — ignore it' 

I he sixth week the surgeon tossed the unlucky head' yet again Although 
feeling rather anti-physicians, he didn't commont After the seventh game the 
surgeon s wife was faced with u very belligerent husband I surgeons can be 
belligerent 1 ). She agreed that enough was enough Even though there was no 
proof that the pnvsioan was employing a tries (for these results could oe 
entirety due to cK'incol -t was very suspicious. "The line must be c/rawn 
somevrhete' she said If It happens a unit. you must play golf with somebody 
else , 

Ideas like thirur am very commonly used in analysing experimental data, 
although the circumstances are usually rather different Thr story has some 
statistical morals 

1 Any set of results involving data subject to chance venation, could be 
'due entirely to chance' es the surgeon s wife pointed out. 

Statist-cs can never prove anything 

2 Statisticians assume thr chance is the only factor initially Like the 
fwgtnn, they give the benefit of Use doubt they assume 'the physician Is 
above-board' The surgeon's wife initially thought that the results were 
due entirely to chance, too.' 

3 In statistical significance tests, as n the story, the time comes when 'the 
hoc must be drawn some where'. Other wius no conclusions can be drawn - 
no action taken, Chance could always be to blame, but the tune comes when 
the evidence is such that it is mom realistic to assume some other factor. 
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INTRODUCTION 

In earlier times medical progress, when it occurred, tended to leap forward, 
e.g. the discovery of the use of B|; and insulin Ihe results were obvious 
However, in the last decade many advances have been made by a seres of 
research workers each contributing a small im(>ruvement to the uverall 
picture On such occasions the only way to decide that an improvement 
really exists is by careful experimental design ar«d analysts. 

In this chaptet we considei the ideas behind the general format of all 
significance tests 


Scientific research usually follows 

these steps 

a) observation of a phenomenon 

b) postulation of a theory to account 
for the observation 

c) prediction of a result on the basis of d, c and f. 

the theory. 

d) experiment devoned to test the 
prediction. 

e) analysis of experimental results 

0 conclusion <i$ to whethet or noi to 
accept the theory 
With which of Ihese steps is 
statistics involved 

Sometimes people say that the very 
nature of particularly medical data 
with d> inherent variability makes its 
scientific analysis impracticable This 
is rubbish statistics depends on 
variability for its very existence If all 
patients reacted the same way we could 
use simple arithmetic However, 
statistics can only supply a measure of 
doubt, not ... 


As the surgeon s wife in our story 
said, the results could always be due 
to what? 


Statistically, like the surgeon's wife, 
we always initially assume that thr 
results obtained are 'due entirety to 
chance' variation This is The same as 
th« legal approach, a kind of initial 
guilt-'innocence 


proof. 


Entirely to chance. 


Innocence. 
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15.5 The m italic* in the Iasi frame 
is rather pcdanticafly called the Null 
Hypothecs. Look at Frame 11 44 ar>a»n 
What was the Null Hypothesis? 




That any difference betivew 
these reg-mes as indicated 
try the results, was only due 
to ch*nce 


15.G The alternative to the Null Hypotliesis 
Is that the results obtained indicate 
that there is a situation which ismore 
than we ctm reasonably account for 
by chance. Wnai is the Null Hypothecs? 


15 / The line must be drawn some where 

Persistency saving that the Null 
Hypothpvs cuuld always be true, pets 
us nowhere. The time must come when 
the ev«dence is such that we must slop 
supporting the Null Hytxjihesis and one 
uur allegiance to the .... 


That the results are only due 
to chance 



Alternative 


15 8 When this point rs reached our 

concluvon is that we now accept.'reject Reject, 

the Null Hypothecs and aocepL’reject Accept 

the alternative theory 


15.9 Dr D >s conducting u tlrug trral to 
decide whether a particular type of 
oneumonra responds better to injections 
of 

*1 lon^acttng penicillin. b| crystalline 
penicillin. 

What ik the Null Hypothesis and rts 
alternative? 


The Null Hypothesis rs that 
any difference «. |ust due to 
chance variation The 
alternative is that the 
difference m response is mor* 
than car' be expected bv 
chance 
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1 b 14 We must make up our minds and draw 
some conclusion with the point D As 
the surgeon's wife said in the story 
The line must be drawn somewhere 
The actual line is called fit# significance 
lev#! 

We reject the Null Hypothesis with 
extreme results at either end of the scale. 
Therefore we need a significance level 
et Doth ends 
Here they are 

What conclusion may Iw reached about 
0 now 7 



I 

D 


With result D and this 
significance level wc strll 
accept the Null Hypothesis 





15 15 Why is a significance level necessary 7 


It enables decisions to be 
made. 


15.16 You could choose any significance level 
you like but Dne commonly used, the 
.05 level shown below is such that the 
root probability of a more extreme 
result at erf her end is 05 
The shaded area under th.s curuc 
•itpicujnts how much o* the toca' 7 



The total shaded area is Ob 
or 5% &> the .05 significance 
level includes the probability 
at either end. 


15, : / What coes the .05 significance level mean 
in the case where 100 rr.als arc iierformeo 7 


1 but in 5 of these trlals a mon 
exifeme result would occur by 
chance. 
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18 The other commonly chosen level is 

aided the ..significance level 

shown below. 



.01 

(.005 at each end) 
occasionally the ,001 levpf 
is used 


19 The.- is die yardstick Significance level. 

against which the evidence in support 
of and against the Null Hypothesis is 
measured 


What Is the .05 significance level? 


The line at which the 
probability of a more extreme 
result is .05. 


The amount of evidence m support 
of the Null Hypothesis is called p 
If p is less than the significance 
level you accept/reject die Null 
Hypothesis 



5.22 


The value p *s the probability 
of such a result or a more 
extreme result than the one 
obtained, arising by chance. The 
smaller p the less the evidence to 
support the Null Hypothesis and the 
greater the evidence to support 
the. 


Reject 

There is insufficient evidence 
to support the Null Hypothesis. 


Alternative, 
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15.23 Imagine p repremwtti n fmm skirt. 

The smaller p the more mini the skirt’ 
When p becomes less than the wificance 
levels there » msufficknt evidence to 
support the Nul Hypothesis and now 
we am subscribe to a tea* difference' 



Significance level (01) 


^Significance level ( 05 } 


Vive to difference* 





IB 24 As p becomes small*"- and wna’ler and 

creep's beyond the very janall rignif-cflneo 
levels so the d fference becomes more and 
more apparent. 
iStat sbcally SiqndicantJ 









15,75 If p i% te= s tf,an .01 it i* more,'loss Won? 

vrjnifirn-1 than ll it was onlv lew 
than .05 


15 26 This symbol. >means bigger than,' At 
the e~ri Df .in article ;n a journal you 
reao 05 > p > 01 * Therefor* what 
concjuswn ... drawn »t the 05 sionlhcanee 
*rvel> 


p .s less than 05 The 
Null Hypotheses is rejected 
and you accept the real 
di f f erance alternat ivr | The 
skirt >s midi.I 
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06> p> 01'. 

At the 01 significance level vft.it 
conclusion would be..... 


That you still accept the 
Null Hypothesis, the evidence 
has not yet crept beyond 
the. significance level 


528 Voo reed ' 01>p' What is your 
conclusion? 


Vou reject the Null 
Hypothesis and accept 
the alternative at this 
significance level 



If you calculate p to be .04. relative 

to the 05 level, .05.p 

I Symbol) 

and yen it conclusion is? 


.05 >p. 

You reject the Null 
Hypothesis end accept 
the alternat've theory 


30 These are the steps in performn»g a 
significant test Number them in the 

correct order 


a> 

Calculate p 

11 is 2nd 

b) 

State the Null Hypothesis and Its 

bl is 1st 


alternative. 


cf 

Draw conclusions 

c» is 3rd 


5.31 Dr C wonders whether more bovs c 

more girls get a particular complication 
of bilharria. Of the 7 cases reported 
6 were male and 1 female. 

(Fictittous Data) 

Suite the Null Hypothesis and its 
alternative 

s 32 Initially we assume the Null Hypothesis 
iscorrwrt Therefore, the probability 
of a boy suffering the complication 
rather than a girt by chance Is. 


The Null Hypothesis is that 
any diMirrenc* in the nrnilts 
is due to chance 
Thu altar native is that ihere 
is a rval difference between 
boys and tprls* 


1 

2 


>33 Look At Frame 13 40 mpir VV-.at « 
the probability of the 7 cases bemy 
6 boys and 1 girt? 12B 
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I 534 The p value ivthc value of such * 
result or a more extreme result 
occurring by chance (including both 
ends of the sols). From 

Fi dfne 13 40 P «nuals ..... 

(The sh»Ht*d areas! 


15.35 p 0 125 

Your conclusion 7 
I Your p value is greater than the 
signibcance < 


1 _7 M 

128 128 -.28 

16 
128 




0 125 


17 nf one sex is more ext 
than 6 to 1. and both pi 

are included.I 



You accept tlie Null 
Hypothesis at all 
significance levels because 
p > 05 and p >01. 


15.36 


You hnvn just performed your first 
significance test. 

Li$i rhe 


j' Slate thr Mg 11 Hypothes.) 

and alicrnitive 
b) Calculate p 
cl Draw the ronciutirxi. 


15 37 Assume all 7 cases had been males. 

By completing the 3 stages. pe> form 
the significance tcsl again 
a) The Null Itvpothus**»and 
alternative remain the same, 
bf Your p value (Frame 13 401 

c) Your conclusion is ......... 


15 38 The different significance Iwels have 
resulted In a different conclusion. Less 
evidence is required to accept a theory at 
the .05 significance level than at the .01 
This Is the nwgn why a theory accepted 
at 01 is sa<i tu be mure significant It 
is stso thi? re awn why if you accept the 
Null Hypothesis ll can be due to une 

of (WO S.JU'Jn 

Either' tfsere is insufficient 

widonce an yet to accept 
the alternative 
or 


1 4 1 
1?fl * 128 
0.016 

05 > f> > 01 


p is less than 
05 You 
accept the 
alternative at 
this level 


p i s more 
than 01 • 
Here si ll 
accepr the 
Null 

Hypothec 


There is in t.iri no real 
difference. 
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15 39 To distinguish between these two 
causes you must do what to your 
samples? 


5 40 Occasions art*** when the theory does 
not involve both ends of the scale but 
only one. A new drug may be more 
expensive and unless it is better than 
the old, people are not interested 
They are not interested in which 
outcome 7 


5 41 When only one end is important the 
significance levels used are still .05 
and .01, but the probability areas 
now only apply to one end i.v foi 
the one tailed test here tlm shaded 
area is.o* the whole area 



5 42 When the significance level only refers 
to 1 tail, of noursr, the p value you 
calculate also only applies to that tail 
If p > 05, your conclusion, as before, 

is ,»«•»««,».... 


5 43 Another time when noly one tai* n 
used «i when pieviOcn knowledge 
can exclude one possible extreme. 

Mrs H's theory is that bdharzia lowers 
the I Q, of the patient \4\e knows it 
doesn't increase itl Thu stated 

. enables you to dec.de 

whether both or only one tail 
should be used 


Increase their si/e 


Whether the old dmg « 
better than the new. 
f Un less the new drug is 
better than the old one 
you can foi gut this other 
extreme. I 


.05 cm 5 i 


That you accept the Null 

Hypothesis either *heri' 

is mo real difference pr 
there is insufficient 
cv idenev 


Alternative to the Null 
Hypothesis 
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15 44 


1545 


15.46 


15.47 


15.48 


15 49 


15 50 


Fram* 11.44 roquirod our both 
tail (si Why* 


A droii company i uns a tr ia' of a 
nnv drug V and its older counterpart 
X initially on hamsters, to ses 
whether Y shatter than X. Si* 
hamsters restjonded liettc* to Y and 
one to X (Each namste*' received each 
drug but with a sufficiently long time 
interna' in between sc that there was 
no carry over of the effect of the drug) 
This requires use of one/buth f;ul(v) 


both Tr*e theory relates 
to either of the dietary 
regimes being better With 
one end w@ would only lie 
intrnisliKl in one regime 
bc»n<j better 


One. We arc only concerned 
with the new thug being 
t*:t?e» than the old 


To draw conclusions horn the last a| State the Null Hypothesis« 

fram:: we must follow what steps? and alternfltivtt 

hi Calculate p. I 

cl Draw conclusion 


State the Null Hypothrsis and its 
alternative 


That .vny rfitfrrr*ncr is 
due tu chat ice. 

That Y is better than X 


What is the probability of a hamster 
improving more with drug Y than drug 
X. assuming the Null Hypothesis is 1 

correct. 2 


To calculate p here you do''do not 

Include hath ends Do not 


Using Frame 13.41) isgain. p htrro * 


1 7 8 

♦ — ^ — - 08 
.!! i?a i?8 

Remember p is the probabriit 
of Ihut mult i»i a more 
ex treme result occurring by 
ch.ince but only at one end 
here 
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15.51 

Symbolise this result using the 
significance levels, p, nod the 
‘greater than" sign (>f 

p > .05 

15.52 

Therefore what conclusion do you 

£ ithcr there is o difference 


draw* 

or there is insufficient 
evidence. 

15 53 

To distinguish between these iwn 
conclusions m the 1 *s1 Irarne you 
must do what? 

Increase the sample size 

15 54 

What is the first stage in performing 

State the Null Hypothesis 


o significance test7 

and its alternative 

1555 

What decision is hissed on this 

Whether one or both ends 


alternative? 

of the scale arc tu be used 

15 56 

What does the Null Hypothesis 

Th.tt any difference is due 


state? 

to chance. 

15 57 

What rs the purpose of a 

11 enables decisions to be 


significance level? 

made 

15 58 

Which significance levels are 

.05 and 01 


commonly used’ 

(occasionally .001) 

5 59 

A significance level of 



..enables mere 

.05 


alternatives to the Null Hypothesis 
to be accepted. However with it the 

Null Hypothesis Will be wrongly 

5 (This is a price ynu must 


i elected in . cases out of 100 

p.iy to reach j conclusion) 

1560 

Your next step in performing a 

The probability of such an 


significant*; test is to calculate p. 

extreme or more extreme 


What does p represent? 

tcvull occuiruig by c l unce, 
assuming the Nut! 

Hypothesis is correct 
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15.61 11 p is less than the significant? 

level, what is your conclusion? 
te.g. .01 >p1 


The Null Hypothesisis 
rejected and the aMcrnattvi I 
accepted 


15 62 Otherwise if p> 05. what is your 
conclusion? 


Either there rs insufficient 
evidence to reject thp Nul' , 
Hypothesis nr there » no 
real difference 


15.63 Practical Example 

Look bark at the golfing story, where the surgeon was left puzzling about 
the physician’s ability to toss tails so consistently. After how many round 
should the surgeon's wife have prruj.idorf her husband to call a halt? 
Remember, if they both tossed headvor both tossed tails they threw ,*g»in 
Ignore the probability of this happening and just calculate, the Null 
Hypothesis being assumed correct. the probability of the surgeon throwing 
heads and the physician tails and vtre versa. 




SUMMARY 

Statistics rfctuh with material subject to inherent variability and helps hy 
providi vj a measure of doubt about theor-es. These theories can never 
be proved. Significance tests enable research workers to draw conclusions. 

The stages in performing a significance tost arc: 

a) State the Null Hypcithusn und its alternative. 

b) Calculate p 

cl Draw conclusions 

I he Null Hyjx>tho»s stoles that the experimental results are not due 
to the theory, but only due to rhance variation. It is accepted as true until 
sufficient evidence is collected to reject it 

The usual significance levels are 05 and 01 They prove a yardstick 
igninsr which the evidence is measured. The .05 significance level mean's 
that in 5 limes out of 100 la probability of .05) such an extremg value 
or more extreme value would occur by chance, The Null Hypothecs <v 
rejected more ufti-n if the 05 level ts used rather than the .01 level and 
more Significant d-tlerences are ound, but the Null Hypothesis « wrongly 
rejected m 5 tests out of 10(1 

The p value tor the experimental res. H is the probability of the actual 
expe imental result or more exhume results arising by chance afone. Usually 
p includes the equally extreme results at both ends of the scale (a two-tailed 
'mu in the, case the significance levels also include both ends. If one end of 
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SUMMARY (contd.) 




14 -05 .-■* 01 the alternates is accepted at the .06 significance level, hut 

not jjT the 01 'es/el An alternative to the Null Hypothesis accepted at .01 
is more significant (him one at 05. Occasionally results are significant at the 
.001 significant level which i$ very significant Even when this rs so u duet 
not prow that there is a real effect. It means rhaT we should provisionally 
accept the «dra of a n?ul difference rather than suppose that a very 
imprnhable chance result h ir, occurred. 
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Ch pter 16 SIMPLE TESTS WITH V 


INTRODUCTION 

The ideas of significance tests can be extended tci various practical situations 
These last 4 chapters will show you how to apply significance tests to — 

1. Large samples where the data is quantitative I in this chapter! 

7 Small samples where the data it quantitative Iirt the next chapter! 

3. The correlation coefficients (in chapter 18). 

4. Qualitative data (in the final chapterl. 

Furtherjdeas used in this chapter involve the distribution of X and 
(X, X}t,and t. Reread the sum man t-k uf chapters 12 and 14 if your 
memory 1 $ rusty and «f need he re-read the two chapter* 


What arc the stages in significance State the Null Hypothesis 

testing? and its alternate 

Calculate p. 

Draw conclusions 


We are going to test these results 
Mrs H. wants to know whether the 
I.Q. of children with bilharzia n 
lower than normaL She uses an 
intelligence test which has been 
so designed to give a population mean 
of 100 with a standard deviation 12. 
She finds that her random sample 
of 36 students with bilharzia have a 
mean result of 96. 

State the Null Hypothesis and its 
alternative 

This test mvulves.tail. 


The Null Hypothesis is that 
any experiment* diffidence 
b unly due to chance 
variation and the alternative 
is that children with hilhar/.a 
have a lower I 0 than 
normal. 

One 

We are not testing whether they 
they have .1 higher I Q 


We initially assume that ihe.. Null Hypothesis 

.- is true 
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16 4 This being so. we can initially assume 
that children with bilhar/ia err no 
different from the general oupulation, so 
far as their I Q_is concerned. Therefore, 
Mrs H s value 5T cun be asstim«d to have 
been ta^cn at random trom the 
general distribution of X, Complete the 
general distribution (rf X. the sample 
mean. 



16 5 What is the utter name and symbol 
for its standard deviation? 


16.6 Draw the spoertic distribution of X 
drawn from th« population in 
Frame 16.2 using the values for ft 0 and 
N given there. 




100 


16.7 MoiV in Mrs H $ result X 

How many standard errors away from 
the mean is it? 

What is its i value? 
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6.8 


We can check that i - -2. 

Complete this formula lor calculating i. 

the result (a) 

* " <b) 

For the distribution of X in jjenerjd 
The mean • ? 

The standard deviation - ? 


>,10 Substituting the results from Frame 16 2 
directly in thn equation we calculate 


S 11 Mark. Mrs M's result in the standaixl 
itormaJ curve. 



16 12 What is the next step m performing any 
significance test? 

16 13 What does p represent’ 


la| The mean 

lb I The standard 

deviation 


T he mear - p 

The standard deviation 
d 

v/N 


i - 


X -n 

0 

\/N 


96 100 

■ ■ .... |> || « 

12 

v'36 

- 7 again 


4 

2 



Calculate p 

The probaoility ol the resu-t 
or a more extreme result 
arising by chance 
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10 14 Look at Frame 16.2 again. Is Mrs H 
interested in both ends of the scale/ 


Nu. Only the end where 

hm sample I Q is/oner 

ttmn normal 


16.15 Shade the equivalent p area in this 

test (uvrnj the results in Frame 16.111 




16 16 What is the size of the area equivalent 
tu this value pZ The diagram in Frame 
14.5 is repeated here. .025 12%) 



161Z Confirm this result from the tables 

relating plot n the pullout. The p value 

there refers to une/fwo end(s) of the 

distribution Two. 

p * 05 Ipast / * 2) 

at both ends together 

p * 02b lp*st t m -2) 

at one end 


16 18 p - .025 in this significance test 

Is 06 >p >.01? Yes 
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What conclusion does Mrs H draw? The result >s significantly 

lower at the 06 significance 
level 

Children with bnhar/o tec 
a significantly lower I Q at 
this level 

IN B Significance tests say 
nothing about whether 
bilhar/iacaosw ihc lower 
I 0 only that there is in 
lact a relationship. Those with 
lower I O's may in fact have 
beermore likely to contact 
the disease by swimming in 
infected water I 
Thnrc sss yet insuffioent 
evidence lor a significant 
difference at the 01 
siqmfrcance level 

You have completed yciur hrsi 
significance test usir>g Y As i gets 
bigger the p value gets 

.. Smaller 


The Null Hypotheses is rejected if p 

is bigger/smaller than the vgnihcance Smaller. 

level Therefore the Null Hypothesis is 

rejected if the i you calculated from 

the results is bfgger/smaller than the Bigger 

significant v»l;.r of / 

ce¬ 



ll Mrs H had not known p she would 

have estimated it using the mean of 

to sample. Control. 
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16 23 Of what Mould her control sample 
have consisted? 


A random sample of sirmU 
children without bilharzia. 


16.24 It Mrs H nad not known o she could 
have calculated instead 


16.25 State a formula she might have used 


,/IlX - X)' 


i 3 


N 


1 


or 



V N 


N 


16 26 Look at these survey results. State the 
Null Hypothesis and its alternative 
Mrs H wants to know whether there is 
any difference in the mean weight of 
children aged 8 years with bilharzla 
compared with those without 
She calculated the mean weight of a 
random sample of 50 bilharzial 
children as 60 2 Ih. and of a random 
sample of 50 non bilharzial children as 
62 lb. The standard deviation is 5 lb 


The Null Hypothesis state* 
that any difference is due 
to chance whereas its 
alternative is that children 
with bilharzia have a 
different wright to normal 
children 


16 27 Assuming the Null Hypothesis is true, 
the remote of children with bilharzia 
and bet control are presumed to come 
from the same .. 


Population. 


Therefore the difference between the 
bilharzial sample mean and the control 
mean follows which drstribut-on? 


The distribution of the 
difference between two 
sample means 
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16.34 What is i calculated to be from the 
data in Frame 16.26. 


16.35 Mr* H in Frame 16 26 is concerned with 
cvnft/two tails, so the significance level 
applies to onc/two t.nls ? 


s/’ H- 

v 50 50 

i c. the same result. 


Two, 

Two I you can use the table 
In the pull nut directly). 


16.36 What Is the equivaler 
o< i jsirvg the tables 
relating 7 tn p 

• t significant value 

In the pull-out 




(a) if the sign if rear 

ice level «s .05 

(a) Fix 0b, the significant 
value of / — 2.0 



9 C 

/1 

05sig 
\ level 





- 




-2 C 

a* 

) *2 


|b) if the significance level is 01 


(b) For 01, the significant 
value of 2 — 2.6 
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637 


We decided that we would reject the 
Null Hypothesis if The z calculated 
from the results is bigger than the 
significant value of /, 

Here calculated / 1.8 

Significant i - 2 0 t.OSj or 2.6 
1.011 

Sketch this idea. 



What is your conclusion? 



We accept thr Null Hypothesis 
Either there is no difference 
or there is insufficient 
evidence 


6.38 Your calculated value of z from thn 
experiment was tugger ■’smaller than the 
equivalent significant value t, so p wee. 
bigger/smailer than the significance 
level and so the Null Hypothesis was not 
rejected. 

6.39 If you know the value uf p you do /do 
not need a control sample, and under the 
Null Hypothesis the simple mean follows 
which distribution? 


Smaller 

Bigger. 


Do not. 

The disti ibulior of sample 
means. 


6 40 Under what condition? 

6 41 Then r ram be calculated using which 
formula? 

6 42 If you do not know fi you can estimate 
it from d control sample and then 

t - ? 


It the sample is random 
and lairty large 


X - n 

i - - 

a 
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16 43 When yo« do not know O u» 

. inn . . , ... ••••• •«-* • 


s from the samples. 


16 44 There n a va^ue of in each sample 
and it is usual to use both » follows 
It the variance in the first sample is 
calculated to equal Sj J 
and the other var-ance is *t 
you can calculate the ovnraM standard 
deviation using the formula 




# 



N B when rr is known 
and Sj* s 2 ' - O 2 

this reverts to 



16.45 Look at tins survey 

Profresto* T wishes to know whether 
tlieie is any difference in width of a 
particular thoracic vertebra between 
different ethnic groups. He measures, 
using a standardised X rav prowdure, 
the relevant width on the random 
sample's of 64 Zambians and 32 
Pnrtuqiiey Easl Africans. For the 
Zambians the mean width was 7 .31 
units I Variance .06/ and for tne 
Portuguese Ear.t Africa' *. the mean wa*. 
7.16 units iVariance .05). Perform the 
necessary test by completing the 
schedule betuw 
State the Null Hypothesis 


State its alternative 


Under the Null Hypothesis these 
particular experimental results follow 
which distribution? 


The differener is only due 
to rhance. 

There is a significant 
difference between the 
widths 

The difference between 
two sample means 
found on ouposittt page* 
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(contd ) 

Vou do not know o ? so you calculate 
s, 3 and s 2 3 and use / (formula) 



equivalent to 

z - 


X, 


a 




z here * lvalue) 


? 31 - 7.16 



05 

3? 



3 


Professor T «s interested in 
tails(s) 

The equivalent t value to your 
significance level I pul lout) - 


2 

2 0 for a significant level of 
.05. 2 6 for a significant 
level of 01 


Calculated / from the experiment is 
more/less thar the significant value of z 
and p is more/less than the significance 

level. 

Therefore the conclusion is 

••t 'Ulttttl 'I I •• I I HI 1 I «M li • I 

««t w» l f »»««»tsv»» ... .. 

-»« «»•« 


More. 

Less. 

That there is a significant 
difference between 
Zambians and Portuguese 
East Africans 

<0!>p> 


Practical Example- 

Repeat the format on the previous frame 
to decide whether the means here are 
significantly different. 

Dr A wanted lo know whether inclusion 
of B l 2 into guinea pig diets increased toe 
weight gam. After a fixed period he 
found the mean weight gam of 50 
randomly chosen guinea fxys without 
B 12 was 5.2 oz. Is = 1.4| and of 50 
randomly chosen guinea pigs given B| j 
was 5.5 oz. (s = 1.3J. 
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1647 


16-48 


16 49 


16.50 


You are powtbly concerned about thr 
tip up between the sign/ffcancc level*, 

U6 and .01, and p on the one hand, and 
the stgnrftaiM value of z and calculated 
l on the other 
Does this diagram help? 



The experiments results Thwmelves 
provide a value 2 called 'calculated l' 

It is a measure of how extreme your 
particular result is, th« Null Hypothesis 
oeinq .. 


p is the probability of such an extreme 
or mure extreme result of Ciilculatnri $ 
occurring by chance. The value of p 
increases as a* calculated r, .... . 


«.*•. 



You reject the Null Hypolhcw. if p is 
$mal!er/grroi<it than the siqnif lc?nce 
h«vr>l, i e.. i» calculated r is smaller; 
greater than the significant value of i. 




Yes - I hope 
The sujnificanen levels (75 
and 01 are t-ijtn valent to the 
Signifies t value of 2. 
p and calculated 2 relate to 
the actual results calculated 
from the data 


True 


Decreases 


S/.UlllW, 

Crrvitif. 
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16.51 4 condition* for applying z tests should 

he satisfied before they can be u«d. 

a) The sample must be chosen. Randomly. 

b) The data must tie qualitative/ 
quantitative. 

c) The variable must he distributed in 

the population. 

dl The si/e of the sample must be 30 or 
greater (there is 1 exception we will 
mention again m the next chapter*> 


16 52 We have used / lasts on the data in 
Frames 

162 

1626 

16.45 

16.46 

Were all these conditions satisfied? Ves 


Random sampler 
Ouantdative data 
Normal distribution. 


16 53 


In the next chapter we will solve 
problems where small samples are 
involved but the three conditions 
otherwise are os for / tests 
State them 


Quantitative 


Normally (although this is 
not important, m tact, it the 
samples are particularly 
large l 




SUMMARY 

Simple tests involving V are used 

1 When the sample* are random 

2. When the data is Quantitative. 

3. Usually when the variable is normally distributed in the population. 

4. Usually when the samples Involved are bigger m si/e than 30. 


fconld. on next page) 
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SUMMARY fcontd) 

A. The tint frsf is to test the difference between a sample mean and a know 
✓a/us of *i. 

Here< = LJ± or if 0 I* known, 

a o 

y/fi 

B. The second test is to test the difference between the rwo sample means o> 
a sample mean and J control moan 

(X j — X; I j „ 

Here* ^ — ■ — if tr * not known. 

[l «? ♦ £ 

V N. Nj 


IX, -Xj» 



•f n ! is known. 




As the calculated value o* t Increases p, the probability of a more extreme 
value by chance decreases 



I h*> Null HyimtneMs is reacted if the uijnificance level w greater than p. 
Tlur. is equivalent tn the particular significant value ol / beunj smaller than 
i calculated from the iwult*. 
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Chapter 17 


SIMPLE TESTS WITH STUDENT'S T 


INTRODUCTION 

The method for significance testing ‘Mien the samples are smaller than DO say, 
was discovered by a man called Gosset in 1900 At The time he was employed 
by Guinness Brewery in Dublin. The hrm s regulations required him to use a 
pen name and he chose the name 'Student'; Y w.r*. the symbol later 
introduced in connection with the distribution used, which is consequently 
known as Student's Y. 

17.1 What are the criteria for using i? Random samples. 

Quantitative data 
Normal distribution 
Sample sire at least 30 

17.2 Occasionally you can use the z tests even 

if N Is lew than 30. The requirement is 
that a «s known accurately and not 
estimated using. * 


17.3 If N is less than 30 and 0 is unknown, 
t tests are required 

Complete this table of tests to be used 



H<30 

N 30 or more 


0 known 

r 

i 

i l 

o unknown 

7 

7 

t Z 

IIn fact, t can be used 
whenever a is unknown, 
but when N is bigger than 

30, t becomes so like z that 
/ can be used instead.) 


17.4 The criteria for using t are otherwise 
the same as lor t. State the criteria for 
using t tests. 


Random samples. 
Qu.miit jhvr data. 
Normal distribution. 
Samprfe size lew, than 30 
and a unknown. 
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When t is uwtd we never use . 
but i 1 . 


~ 4 "*i 


175 


17 6 


17.7 


17,8 


179 


17 10 


State the formulae 


for calculating s 3 


Henr is an exftmpie. 

Wou (J you use t or z? 

Whv? 

Dr C is Interested to know whether 
people who have had heart attacks 
have a blood choltttwol level tltlferttm 
ftum the rtornwl level uf 180 tttg/100 ml. 
He has a random sample of 16 patients 
and calculates them mean blood 
cbo tsterol as 195 m^'100 ml with a 
Variance of 900 


What art? the stager. in performing a 
significance test? 


Perform the first stage for the data in 
Frame 17.7. 


A t distribution I? very like the standard 
normal distribution The formula for 
calculating t is very similar to that for 
calculating.. 


O 2 


LlX- X) 2 


N - 1 



ItX 1 ) - 


(Sx > 3 


N 


N - 1 


L 

Random sample. 

Quantitative data. 

Blood cholesterol level* 
can be assumed to follow the 
the normal distribution 
in the population. 

N i* lirw than 30 and 0 is 
unknown 




a) State the Null H 
■inrl its alternative. 

b) Calculate p Iz or t) 
cj Draw conclusions. 



The Null Hypothesis is that 
the difference is only due to 

chance 

The alternative *s that people 
vurvivmy heart attacks have 
a different blood cholesterol 
level 


z. 
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17 U p in Frame 17 7 is known. 

Therefore if 30 or more patients had been 
used, which formula would you have usnt 
for /? 

(Look bock to the summary at the imd 
of Chapter 16 if you have a memory like 
a sieve!) 


M2 X p 

t « -- 

s 

V /N 

, X - p 

It doesn t equal - Why? 

o 

V'N 


159 


X - pt 
t - 

a 

y/U 

X P 

ui / - 

s 

N* 

as O is not known in this 
example. 


Because if 0 »5 known, 1 
wouldn't be used. 


17 13 


What is the value of t In Frame 17 7 7 


s ? 900 s 30 
19ft 180 15 


30 

V M0 


7'-j 


r 2 


17 14 Calculated t * 2. The next problem is to 
find the ..of t. 


significant value. 


17 15 This is not so straightforward os for i, 
as there are a series of t distributions 
which cannot all be standardised to 
one t distribution, fhey all depend on 
the symbol f. 

f equals the value of the denominator 
when s : is calculated. What value has f 
in Frame 17.7? 


f - 15 Because 

s 5 ^|X Xi* 

N 1 

I(X X) J 

=- h«re 

15 

15 is tne value o* the 

denominator 
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17.16 


17 17 


17.18 


17 19 


17 20 


17 21 


17.22 




The significant value of i which you 
require is that where.. IS 


The ( tables like the y tables include the 

area in one^two laris? Two. 


In Frame 17.7 Dr C is interested in 

one.’two tails? Two. 


There lore you need the significant value 
of t when f = 15 in the column of the 
ugnificanct levels for ? tail* 

Take a big breath ami took at the t tables 

(it the puli out What is tne required ? 131 fnr 06 

significant value of t? 2 947 fu* .01 


The conclusions for calculated and 
significant t are the same as they would 
have been had we a calculated t 1j- <J c 
significant t 

What is your conclusion here? 



Ir both cases calculated t n 
l.»ss than the sigr.iticant t. 
Yuu retain the Null 
Hyporhesis Your conclusion 
is either that there is r»o real 
difference or that there m 
a real difference and 
insufficient evidence 


Do you remember how you could 
■list inyu.'.n butweari those 2 conclusions? 


Yto, I fKipe 
Make N bigger 


Or Clever OiU tells Dr C mat in fact be 

shun Id only have tested to sec whether 

the blood cholesterol leyr-l was higher 

than normal ITht literature excludes the 

possibility of *t being lower ) The tQSt 

thrn would ItKlude one.twn tails? One only 
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17 23 

Remember that the t tablr-% include 
bulb fails What are the significant values 

1 753 for 06 


of t now? 

7 HO? for .01 

17 24 

Dr Clever Dick s conclusion, if hi: hud 

He would Iwve accepted the 


been performing this experiment, would 

alternative as true at 05 but 


have been what? 

not at 01 l,05>p> 01! 


iRcmembrr wr» calculated t to he 2} 

Actually he was quite clever 
to get a sign if icon 1 result 

This indicates how significance 
tests are made more sensitive 



d one taned tests can be used. 

17.25 

When N ~ 20, f -.. 

19. 

17.26 

As f gets bigger the t distribution 
becomes more nearly normal until when 



f =.we can safely use / tablet 

f * 29 


instead of t 

IN 301 

17.27 

1 represents what is called the number of 
degrocs of freedom for frnt choicer 

A playing captain wants to choose 
the rest of his hockey team 

In statistical symbols 



.. = 11 

N * 11 


(symbol! 


innikiiMM 10 

1 «• 10 


{symbol} 

Because the captain is 
playing, be only has 10 ' 
degrees of freedom' (tree 



choices! 

17 28 

This should give you some idea why 
statisticians pedantically call T, the 



number of deureus of freedom! 

/ does not uw s 2 / <T / f 

i never uses f 


t dues not use s 2 / (T > 1 

r never uses a 1 . 

1729 

What are the criteria for usiny l tests? 

Random samples. 

Data quantitative 

Norm.il distribution 

N less than 30 and o not 

known 
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17 30 Although t tests can t>e used to analyse 

results from very small samples these mini 
samples are not very smutivn, 

However, lor the sake of practtce, 
imagine here thai the results refer to the 
pum threshed for only 5 random patient', 
after a new analgesic. Are these results 
significantly higher than the population 
average. 4 units* 

<1 would not recommend such a smoll 
sample!) 

Patient A 0 C 0 E 

Pam Threshold 7 5 2 4 7 

x * x - e 

« ? * s» - £<X -X |3 

N 1 


* 

f - 


N 

d 


4 * 0 r 9 ♦ t t 4 


» 2 2.1 


4 


f 4 (The 
denominator when 
s 3 * calculated) 


N 

H 


5 

4 


4 Vi 


Calculated t 


X n 


v'N 


• f 


5-4 
2 1 

V T - i 


This Is a ... tailed :est with f equal to 

.—- *hg cortespood rig significant 

values of t - 

What Is yopr conclusion? 


Ona 

4 

2 132 for .05 

3 747 for 01 

Calculated l is less than 
sign if cant t. Still accept 
the Null Hypothecs The 
analgesic «. wthor ineffective 
nr thsre is insufficient 
evidence 
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t 7.31 Look at these result* 

A psychiatrist got exasperated when 
patients 'phoned him during the night 
Iwaking him up! I to complain of 
insomnia. He heard of a new drug Z\u' 
and decided to try «t randomly on 5 
patients to see whether it was effective 
He gave them alternately at random 
a placebo for a week and Z«zz for a 
week and calculated the average number 
of hours each patient slepr on the 
placebo and or Z\li. 

Her e are the results: 



Average on 
Placebo 

A verage on 
Ziti 

Pnoent A 

7 

7 

Patent 8 

6 

13 

Parent C 

3 

6 

Patient 0 

1 

0 

Pet tent E 

4 

5 


Incidentally Patient B lost his job' 


They can be modified to that t can be uses! 
The pairs of levilt* are subtracted so rh3t 
X now tefers to these differences. This 
modification is consequently called 
the paired t lest |on differences!. 


Patient 

Average on 
Placebo 

4 ref rye 
on Ziti 

Di defence 

P/.u.otn)) 

A 

2 

7 

♦5 

B 

6 

13 

*7 

C 

3 

6 

<•> 

0 

1 

0 

lb! 

E 

4 

5 

♦ 1 


32 These difference* between the two drugs 


are based on a. sample r>f Random 

patients, the data is.. Quantitative 

the differences are.... Normally 


distributee, N. equals. 5 Ithr number of rMterenccsi 

and 0 i*/U not known U not. 
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17.33 Theref ore we can use a .test on 

these differencei. We treat these 
deference* 5. 7. 3. 1, 1. as 6 values 

of a variable. 

17 34 Perform the first stage 


1 


The Null Hypothesis is that 
the differences are only 
due to chance The i 
»s that Z\u increases the 
average number of hours 
sleep 


17-35 We art going to perform the t test 
using these 5 differences leaded X 
bcloA'I. _ 

Complete the table and calculate X and 
s for these differences 




Patient 

X IX XI 

<X - X> 3 

Patient X IX - X) <X - X) : 

A 

5 2 

A 

A 5 7 * f 

6 

7 4 

t6 

B 7 4 16 

C 

3 


C 3 0 0 

0 

-1 -« 


O t -4 16 

E 

tl 


E it -2 4 


Xx= ZiX-XH) 

IfX-Xf 1 - 

IX 15 ItX-Xl* '40 


of course 



X = 

» - i 

N 


; ’ I 

• V 

'2<x - xi J 

N — 1 

f 

/HX - XI J 
v N - 1 


V 


/40 


00 


17 36 . . f 


f = N - 1 =4 


17 U7 Under the Null Hypothesis there n no 
rnal difference and so theoretically 
the iroan difference should prjual 

i.o ju - 0 
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16b 

17 38 

We use the same formula here as 

X - m 


before. 

t -- 

s 



n/^T 


.. t • 

(Frame 17 12 if you d 
forgotten I 

17.39 

In this example 



X = ? IFrame 17.35) 

X - 3 


H - 7 1 Frame 17.37) 

H = (3 


\ - ? IFurne 17 35) 

s - \ /10 


N * ? <the number of differences) 

N b 


.\ Calculated t = 

3-0 3 

- - -2.12 

v/£ v 5 

ys 

17 40 

Calculated t 3 2 12 



The psychiatrist is interested in one/two 

One He wants to intnv 


tadlsl 

whether the drug *s more 


The significant value of t . 

effective than the placebo 


It - 4, remember). 

2 132 for 05 

3 747 fo* 01 

17.41 

Calculated i «. ... than the 

significant value of t. Therefore you 

Less 


accept/reject the Null Hypothesis. 

Accept 

17 42 

Such tests based on the difference 
between fairs ul results on an individual 



are called paired-t-tests In paired-wests 

X 14 the difference Detween 


the variable is X and we use 

_ the paired rosuUs 

X is the mean nf the 


X-P 

difference* 


t- 

M ts the mean theoretical 



difference 


v/*M 

- 0 

s is the standard deviation 
for those differences 


What du these symbols mean m the 

N is the number of 


context of paired-t-tests? 

rttffer&ices. 
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1 1 43 Pairing results »s a good idee if the 

results tali naturally into (Kirs i.e. each 
number is more closely related to Its 
pa«r than any other result 

You have » senes of twins You should/ Should. 

should not arrange to treat the results A rwin K more like its 

»n parrs. opposite number than any. 

other perMin 


17.44 Noah in his ark, had he hod the time 
<or the Inclination!| to do a drug trial 
on hn animals, shouldrshould not ShuukJ 

have used the po rod-t-test 7 


1 ^ 45 When would you pair? (ratisticailylf 


When each of the pairs is 
more like each other than 
the rest of the group 


1/46 /say,now should 

ywe pair?_ 



! are you bored' 
with correlating? j 


17.47 


Won frequently the paired t test •% 
used when two thugs *r« given to 
® ,:h patinnt or when .« specimen rt 
tester 1 using two different techniques. 
*Vhen results fall naturally into pairs v 

ran treat the ....- as the var j rth | c 

and straight away use 


differ mr.es, 


K - p 
t »- - 

y/NI 

Ah«te y. - ? 
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17 51 Wc have to modify our usual formula 
for s* so as to ioctude the results from 
both samples. For t we could use each 
separately as m 

t * hz*i 

/*j1* 

V N, Nj 

Wc calculate s J as follows when uvnq t, 
pooling the squares of the devotions 
from the miwm 

f*r utmplt 2nd it 

, Kx x ,» 3 ♦ S(x-x : r 

N| - 1 + Nj- 1 

Which formula tor s : , which you are 
used to, >t this pooled formula most 
like’ 



S(X - X( J 
N - 1 


17 52 Let us takn an ogw ompififtd example 
Suppose rf)e 1st'samp/ p/s 1, 2, 3 

X X - X (X - X) 2 

1 
2 
3 


Ex = nx xi o E<x X) 3 


What is N, 

r 


N, = 3 

What in X, 

7 


X, = 2 

What r» Ef X 

X) J in the 15t sampve? 

2<X - XI J - 7 

Suppose the 2t>d tamph » 0,?,?,4 


X 

X-X 

(X - x> 3 


0 




2 




2 




4 




Ex- 

IlX XI o 

EiX - Xf 1 - 

teantef. oil a poodle page! 
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17 5? 

contd 

Whai r*N 2 ? 

N, - 4 


What is X 2 ? 

X 2 » 2 


What is E(X X) 2 In the 2nd sample? 

SIX X> 2 - 8 


in fur tampU) in 2nd mmoh 



. 2 £<X - X, I 2 4 I(X - Xjl 1 

2 2 4 8 


N, - 1 + Nj - 1 

2 4 3 


« ? 

= 2 


s * 

> - v? 

1753 

In the Iasi chapter when we didn't 
know 0 we used 



*i -x 3 

/•.*♦** 

V Hi Nj 



We keep the two sample variances 
separate. To use t we must pool the 
information from the two samples and 
calculate one value s', which * 

3 _ £<x~x,) 2 • Six x>r 

N, 1 4 N, -1 

17 54 

Having calculated s a we use the formula 



t - 

, , *•-?! 



/’ 1 

S ^Ni * Ni 



Tu l>e pedantic, unless the 
vanances in both samples 
are approximately equal 
we do something else This 
is rarely so. however, and 

SO we MM lorgiM about this 
problem here. 
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17 55 f aiwaysequah the denominator when N, 1 > N 2 — J 

i*' is calculated. Therefore to usi t 
when ju i* not Known and the sums of 
squares of the deviations from the 
meam ure pooled. 

f » • N, f Nj _ 2 


17 56 Here are some fictitious results 

The suggestion „ that alcohol slows the 
reaction time A vrr v smalt sample of 
students was taken, b of whom were 
given a reasonable amount of alcohol and 
3 drank Sehhh . . |fty you know who 11 As 
soon ns an electric bed rang each was to 
press j button. The time lapse was 
recorded electronically These arc the 
reaction times 

Sehhh ..» Alcoholic 


10 units 

1 ? units 12 units 

14 units 14 units 

16 units 
IB units 

Perform the first step in the significance 
test 




The Null Hypothesis i 9 
that any difference is due 
only to chance 
The alternative is that 
alcohol slow?, the reactron 
time 


1757 


Complete th»t tabic and calculate yuur 
t value. 



hi wrrvW 


3rut m.~ xi.'a 


X 

rx-Arj ! 

X 

X X IX - X) 1 


to 


in 



12 


t? 



14 


14 





e 





•e 





SK- 

It llX-jp 1 * 


x, 



X, = 

x, 

N, 



N : - 

N, 

a* 









s - 





« 8 

s 









* - 

f 

9 





f - 


* 12 X, * U 

- 3 Nj s 5 

(4»0^4m16*4’0»4i 16 
6 

v-S I 

6 (tire denominator) 


[contd. an opposite pjyeJ 
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1 7 61 Th • formula for calculating pooled >* 
wh«r«i ji is unknown is 

iix x,i J • iix x 3 r 

- n 7 • m 2 

Wtite this formula wi'nuu* using '.hr 
moerts 


1 • | 


llX ' : *,*5, <~ Xl 

"T 


Use 
below 


N, * Hi - 2 

this result in the i name 

uw 


17.62 Practical Example 

In Frame 3 l and 3 5 wr have random 
sample* of lurth weights of children ol 
diabetic and non diabetic mothers Are 
the birth weights of children ol diabetic 
mother* significantly higyi/t than the 
co. tr ol group? 

Most of the orithmrtic you undertook 
in the Practical Examine it the end uf 
Chaptm 7. 




17 63 Ttiis acht lo t u included as i sum try to hitlp ym. decide fhir right test to 
use m t^e 3 following practical examples Which am specially chosen so that 
youc n piaLt deciding which • or I to* to use Assume nil the results 
in the nx.impies aie tamtam and ^ased on normally distributed data 
First you der. dp whether n oi s »*, to he used if o is unknown the decivon 
ibout which test to use is based on u-npfo u/e h tm« thm case the nest 
decision n wbethei p is known (when the test depends on the distiibubon 
of the sample mean! or fj is .aik nown (when the rest depends on the 
distribution of ihe difference ot two Minpie means). 




Page 179 of 223 













173 


SUMMARY 

Consider the questions in turn and decide on the test depending on the 

answer*. 
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SUMMARY (couldJ 

1. Two students measured the length of the caecum In 26 male and 20 female 
specimens of i pei titular animal They were interested to find out whethur 
the ci^tra I lenqth svas significantly different in the two sexes. They calculate 
the average male caeca! length as 14 8 cm. urn! that for females 13,7 cm 
What tncmulft would they use to calculate s* 7 
What do the symbols in the formula rppnewmt f 
They calculated s' correctly to be equal to 0.81 
What condusion do they draw? 

? Professor X hud the idea that imofile with cancer of the stomach ate more 
then others He paired each of his 20 cases of cancer of the stomach with 
another patient with a different diagnosis but of the same age. sex. race 
and social class. 

He analysed the average daily intake and found that the mean difference 
was 180 calories. IThose with cancer eating more ) 

The standard deviation of th« differences was 450 calories. 

Whar Is his conclusion? 

3 A Secretary for Health wanted to know whether a higher number of car 
accidents could be related to driven with increased blood alcohol levels. 

He took blood wimples of 100 random drivers involved in car accidents 
and the police chose 100 drivers randomly who had not been involved in 
an iiocidrint. For those involved m accidents the mean alcohol level was 
2.42 units with a variance of 0.39 For the control ipoup the mean was 2 24 
units with a variance of 0.25 
What conclusion would he draw? 
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TESTING FOR REAL CORRELATION 

INTRODUCTION 

In Chapters 8 and 9 we learnt about correlation and how to calculate the 
correlation coefficients r and p By chance a ' ve uc -ve value ol the 
coefficient would usually be calculated even though in fart them existed 
no reel correlation Indeed it is exceedingly tare tn obtain the exact result 
r or p ~ 0 This chapter show’s you how to decide whether a particular value 
for r or p is likely to be due to significant correlation ot chance variation 


Irom 0 

6SD 3 

Wnat does 1 equal? 


What is ‘O' in the above equation* 

The difference between 
rankings 

What is the name of the other correlation 
co efficient you have met? 

Pearson's Correlation 
Coefficient, r 

Suppose thai you wished to decide 
whether a value of r - *0.1 
reurevented real correlation or ju*t u 
chance variation from r = 0 

What would your Null Hypothesis be? 

That the variation from 0 
was entirely doe tn chance 

If you were interested to detect real 
negative correlation this would involve 

a ... tailed test 

Where no sign is specified a . 

tailed test is required 

The last frame required a. tailed 

test? 

One 

Two 

Twu 

To tot your Null Hypothesis you use 

, * *j£E1 

v/i—^ 

1 You needn't remember this formula) 

What does N represent? 

The number of p.nn rtf 
results. 













17G 


18 7 


18.8 


139 


18,10 


18.11 


18 12 


18 13 


18 14 


To use tills formula • and p are 
interchangeable What is the formula for 
testing whether a value of p represent* 
real correlation? 


In this particular t test 
t - N 2 

What does f represent? 


Suppose the correlation coefficient, 

*0 1 was calculated from 11 pairs of 
results 

» * 7 

r • *0 1 

N - II 

I - ? 

iSuhstitutei In the formula in Fnim* 18 $1 


Calculated: * *0.3. 

What dors f cpual again7 

What are the corresponding vgntficant 

t values in the V table,? (uvo tailed test) 


Is calculated t blggsr than sqnihcant t? 


Do you reject the Null Hypothesis’ 


Du you conclude the currHation 
coefficient *0,1 Hons represents only 
Chance variation from 0 ? 



No -0.3 is smaller than 
both 2 2t>2 uiid 3 250 


No 


Vih, or rhnf there is 
insufficient evidence. 
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18.15 

Suppose that height and weight in your 
class are correlated with r = t0.6. You 
wished to test whether this represented 
real *ve correlation. 



This is a.tailed rest 

One. 

18 16 

There were 27 members of the class 
measured so 



h 

< 

Z 

»o 

f 

-v 

oa/55 3 



N /1^ 0.36 8 



3 75 


f » ? 

f = 25 

1817 

Calculated t - 3. 75. significant t - ? 

(from the tables) 

1 708 at .05 
? 485 at 01 (1 rj,l; 

18 18 

What is your conclusion? 

There is real »ve correlation 
btrlvwon hngfit and weight 
in the class 

1819 

Had only 6 members of the class been 
prnsent what would your conclusion 
have been / <H warding the correlation 
coefficient that isl). 

0 6 n /4 1? 

t = . 

v/i - 0 30 8 

t * 4 



The conclusion would now 
bff thai the correlation was 
not real or that there was 
insufficient evidence 

18 20 

Practical Example 



What conclusion do you draw about your values for r and p calculated at the 
end of Chapter g? Dow the data represent teal correlation? 

Summary 

The formula used for testing the Null Hypothecs that thei e is rio real 

correlation >s 


V 1 r J 0 P 2 


where f N 2. 
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Chat about which tests to use 

Some matters have v»id that although they can understand the frames 
individually they would be confused about which tests they should use if 
faced with research data to analyse from scratch This chat is intended lor 
people who fed that this is a problem Of theirs. 

In the past some medical students at this University have u'-dertaken small 
holiday research projects. Mere arc 4 modi lied examples. See whether you 
would now be able to perform the necessary tests Remember, from the 
statistical point of view, it i» as important to realise your limitations as to fci 
which, tests are withir your capabilities. You are not yet In a position to 
advise on d'l their projects. 


Project I 

3 students, View* Beroev. Mak»n?a 
and Trachtenberg, measured the x ray 
width of the 1st thoracic vertebra of 
3 different yixips of people There 
wore 100 in each group They wished to 
know whether the 3 groups differed. Are 
you able to perform the applicable 
calculation f 


Project 2 

Mr Jeibect measured 

IX; the average nse in the height nl 
the dwpnrugm relative to the nbs. 
and. 

(Yl the ncrosse m the area of the 
heart ami pedicle 

He nBrt 60 of each measurements or a 
qr.->up of 60 % rays He wanted to know 
whether iXi was associated with f Y> 

11 ?iven you hn results could you 

have giver, him die answer? 


No except to compare the 
groups two At ii Imr .rung 


. 


l - 

A 


X, X; 


,172 

L + * 3 


N, N 


3 groups are properly 
compared together using 
the Analysis of Varu.nce* 
technique which you do 
nor know (Incidentally, 
there was no difference 
found! 




Yes, I hope. 
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Describe how you would have answered 
his problem. 


i 


lix Yi 


ixiv 

N 



l l* * 
*1 


llv* 


ilv - 

N ' 


IlklMI 1 


i N N J 



Two tilth 1 - ?>fl 
lAri ,r. Icmrvt 1 


Protect 3 

Mr Shepherd wanted to compare the 
number of Katanga Tribesmen wiih ;» 
whorl on the finger print of their 
right thumb, with the number in lh«* 
Nanjanga tribe. 

Could you work this out given the data? 


No, those with whorls 

wwr counted, it in qualitative 

data. 

This is ihc subject of the 
next Chapter. 

(He found no difference,) 


Protect 4 

Two exchange students from other 
Universities who have since graduated. 
Dr Arthur (GlasgowI and Dr Terry 
(Birmingham), measured the heights of 
(11 35 eleven year nld schoolboys 
with goitres and (2) 30 cloven year old 
schoolboys without goitres at a local 
mission school They wanted to know 
whether the boys without goitres were 
bigger than thou- with. What formula 
would you use? 

This is a one or two tailed test 7 



One (They found that boys 
w thout genres wee not 
significantly burner I 


CONCLUSION 

Although after completing this programme you are still limited in your 
knowledge of actual tests which are in use. the basic ideas are always those 

you know 

You should always understand what statistical tests are alxjut and what 
statistical conclusions mean 
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19 2 


193 


t94 

19 5 

196 


19 7 


198 


199 


Chapter 19 SIMPLE TESTS WITH X 

INTRODUCTION 

V* 

Vosl tests involving qualitative data depend on v 


i, t, and r are all calculated tn>m 
Ovolitative'’quin1iii»tive data 


What iv the drsttnction between 
qualitative and quantitative data' 


X iv written CHI and pronounced 
Kl. tesiv .ne called 


test* 


X 2 depends on f. like 

4»r. >»»4 tr*i ul<4Mt • 

(Symbol) 


Wha* does t refw merit f 


Qualitative data ir. usually counted 
irUtj groups or adrgurtes An 
example * blood qroups The Null 
Hypothesis soy* that any variation 
between the otwerved numturm the 
groups and what you woo'd opect 
is due to what? 

It there is a significant d Iterance the 

variation is . than is expected 

hv rhance and this 'uqgests mat some 
olhtrt tact nr n involved, 


As vv :b i and t, it the calculated value 
ot X'"- hujyrv than the Mgn>*icant 
value you acuept reject the Null 
Hypothesis 

Like i end t tables, X* tables are 
uiaddirectly m one two tailed tests' 


Quantitative 


Qualitative data *s counted 
On.tr.Tirarive is measured. 


Chi squared 




The number ot degrees 
ot freedom 


C/Mrice variation only. 


More. 


It tv bigger, p is 
•xrwiter and 1h«t Null 
Hypothir.tr. is rejected 


Two 
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19 10 Look at the X* tulles in the pull uui 
Like the i tables, the column headings 
arc s I and the rows 

correspond to different values. 


Significant levels 


of .... .^.(symbol I 


f 


19 11 


Compare the structure of X* tallies 
with t tahles Wh.il is the important 

differenced 


X’ fablm tabulate values for 
significance levels of 99 
and 95 .<u well as »or 10 
.05, .02 and .01. 


19 1? The .99 and .95 levels correspond 
to the state of affairs where The 
Observed results differ from the 
theoretical results tess even than 
you would expect hy chance 
In 99 ur 95 cases out of 100 such or 
a more extreme result would occur 
by chance. 

What does this infer’ The possibility of cheating. 

19 13 In fact Mendel's pea observations Ho didn’t, in f-ict, it was 

based on genetic theory differed subsequently found that the 

/ess than you would have expected Abbot’s gardener knew the 

by pure chance, p > .95 Do you results the Abbot wanted 

think Mendel chested’ and tried to please him 1 

Hint: He was an Abbot! 


19 14 The main a iteiia for applying X ; 


are - 


a) The samples are ,.. chosen 

b) The data is ... 

cl Ideally the lowest expected 


frequency in any group is not 
less than 5 


Randomly. 

Qualitative 

N 8 Assume all the samples 
are random In this chapter 



usmg X 1 is to see whether actual 
counts comply with those expected 
on theoretical grounds. (Goodness 
of fit to a theory f This is so with 
the example below butd on 


Yes. The 3 criteria listed 
in I he previous frame ara 
satisfied. Thr Iowcm 


genetic theory Are all the criteria 
foi applying v satisfied here/ 


expected frequency u 25. 


(contd ovtr l mf) 
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19 15 contd 

A gonnt-rlst was interested to sec 
whether two plants had the genotype 
Aa. Me crow*l them to see how 
close the progeny were to the 
theoretics- ratio 

A a ’A AA 14 aa. 

There were 100 orogeny and these 
were hrs results-la random sample) 


Gitr*otn>r 

Nu<nh»t 
•Dhsunrd <91 

Number 
Expected 
ivi theory IE) 

Aa 

53 

50 

AA 

23 

25 

M 

24 

75 

Tut ill 

100 

too 


19 16 In the last frame state the Nu ,J l 
Hypothesis and its alternative. 


| observed expected)' 
19 17 _ v \ numbe r nvmharj 

expected 

number 

v (0 El 3 

say 1- 


There rt u value o’ -■ 

E 

each clow. i«... 

and means.. 


for 




Hall Hypotheses; The 
differences are only doe 
to chance. 

Alternative: The dif 
are more than could be 
reasonably expected by 
chance 


Capital sigma 
Add together. 
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19 18 In Frame 19 15 

„ . ._. w (53 - 50) 2 

Calculated X »- for 

50 gerMitype Aa 


for 

genotype A A 
lor 

genotype as 


which value? 


183 


(23 ?5| 2 124 - 251 J 


25 


25 


9 4 1 19 

50 ‘ ?5 ' 25 50 


0.38 


19 In Frame 19.15 

Calculated X 3 * 0 38 

To find the corresponding significant 

value of X 3 we need to know. 


*20 


19.21 


19.22 


Where X 3 » 5 used, as in Frame 19.15, 
to decide whether the actual results 
‘fit' some theory (in this case genetic) 
f - k 1 where k is the number of 
classes. 

In Frame 19.15 


The research worker using X‘ is nearly 
always interested in both tails, i.e 
he »s interested In differences between 
Observed and Expected results in 
either direction This rv’is not the 
cave in Frame 19.15 


X 2 tables record both tads as they 

stand 


In Frame 19.15 1 
-' Significant X 2 
(two tails) 


3 - 1 {There .ne 3 
classes or genotypes) 


Is. 

For a one tj<l test to 
be applicable the research 
worker must be aware of 
which classes hr expects 
to contain fewer members 
and which hr expects tn 
contain more, 


X 2 5 991 at .05 
X 3 * 9.210 at 01 
from the pull out. 
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1923 


Calculated X* - 0.38 and is Mrss than 
skinificitnl X\ What n vnur 
conclusion? 



Him (T he same as if calculated 
2 or t was less than ugmf icant 
i or tl 


You accept the Null 
Hypothesis The conclu 
is that the variation is 
insufficient to suspect 
any other factor is I 
and Is due only to chancy 
i t the results fit ’.he 
genetic theory. 



19.24 f N - 1 in ordinary t tests where 
H is known. 

f k 1 m X* testing ‘ Goodness of 
fit between actual results and results 


expected according to some theory 
N m these t tests is the number 
of... ivhemn In X' k represents Results 

the number of.. q lasses 


19.25 f-N 1 in the paired t test atao. 

Here N rs the number of _......... Difference! 


19 26 While we hjvc been discussing thiv 
the grnelicht has performed another 
eupcriment See below 

State the Null Hypothesis and 
alter nat we. 


The Null Hypothesis is that 
it iv only due to chanco 
The alternative is that the 
variation is greater than that 
expected by chance. 

fcontd, on aprwftte p*$e) 
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19 29 So fat the expected results were 
calculated nn some theoretical 
grounds (Genetic) Just as somet imes 

fo» calculating .? we use. 

to estimate er. so sometimes for X* 
we use the observed results <0» to 
estimate <E> 



We use the observed result 
like this in testing whether 
one factor is associated wt 
.mother 


19 30 When we use X ; to test associattot > 
rather than to test gotxAiess o( fir 
to a theory it affects the value of 
f. What does f represent? 


The number of degr ees 
of freedom. 


19.31 The data to be tested fur association 
is arranged in a 'contingency table'. 

Here >s an example. Is this a table 

of '0‘s or ’Es? Observed results f'O'sl 

In a survey to help decide whether 
a particular inoculation had any 
protective properties the following 
results were obtain ml during an 
epidemic: 




Nor 

flow 




loocuhtnd 

Tools 


A f fa. left 

5 

66 

60 


Alar At fated 

96 

145 

240 


Column 





rorofi 

too 

200 

too 



19.32 Suite that Nli I Hypothesis and its 
alternative hen' 


19.33 W<: assume initially that the Null 

Hypothesis/Altei native *s true and 
on this basis calculate the expected 
results using the row and column 
total 


The Null Hypothesis is 
that any assoc lotion is only 
due to chance. 

Thn alternative is that an 
association ready exists 
between inoculation and 
incidence. inoculation 
protecting 


Null Hypothesis 
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i 34 In Frame 19.31 using the column totals 
we see that 100 out of the total 
300 or 1/3 were inoculated Assuming 
the Null Hypothesis is true and that 
inoculation rs not really associated 
with the incidence of the disease we 
would expect 1/3 of those affected 
to have been inoculated,'not inoculated. 


) 35 But in Frame 19.31 we see that 
a total of 60 profile are affected. 
Therefore we would expect 

that.of them had been 

inoculated 


9.36 Similarly 2/3 of the total are not 

inoculated and so you woukf expect 
2/3 of those 60 affected, i.e. 40 
peoole to be affected and 

inoculatedn'not inoculated. not inoculated 


9 37 As inoculation is ashamed o have 

no effect and 1/3 are inoculated you 
would expect also 1/3 of the 240 
not affected to be inoculateo 

i.e. you would expect.. 1/3 x 240 = 80 

inoculated, not affected people 


9.38 The expected results calculated 

in the lait 3 tr.imirs nr« shown below 
How many not inoculated not 
affected people would you expect? 

inoculated Not 

MoMflM 

Attto.tMi na 4 U 2/3 of those not affected 

/VorXretr 80 7 /» 2/3 of 240 * 160 


I nocut a tod 

i.e. 1/3 of the people 

are inoculated and as this 

is assumed to have had 

no effect 1/3 of those 

affected would be 

inoculated, 


1/3 or 20 
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19 39 The contingency tobies for the 

observed results and for the «* peered 
refills aiv shown below: 


Observed <0> 


What do you notice about the tow 
totals and co’utnn totals iri each table? 




(roc mated 

Not 

Fvw 




Inoculated 

Total 


Affected 

5 

56 

60 


Hot Affected 

96 

145 

240 


Column tout 

too 

200 

300 


Expected <E) 





InocuUtKd 

Not 

Row 




Inoculated 

Total 


Attuned 

30 

40 

60 

Not affected 

80 

160 

240 


Column To to) 

100 

200 

300 



They are the same m both 
tables 


19 40 Also notice that ejeh expected result 
equals 

its row total x its column total 
the overall total 

e.g. for the inoculated affected group 
in Fume 19 39 the expected result 
? x 100 


? 


- 20 




60 x 100 
300 


20 


19 41 How can you calculate the expected 
boquencies tit contingency tables? 

19 42 Grvc the three criteria for applying 
X 1 i* 

The dul l n. .. and the 

samples are and no expected 

frequency is leis than __ 


Use the formula 


row total x column t otal 

grand totaT 


qualitative. 

random. 

5 
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19.43 Can X' be applied to our inoculation 
data bore? 

V ' 



Inoculated 

Not 

Inoculated 

Root 

Total 

Affected 

6 

66 

60 

AM aMacted 

96 

145 

240 

Column total 

too 

200 

300 

T 

Inoculated 

Not 

Inoculated 

Row 

Total 

Affected 

20 

40 

UO 

AM affected 

90 

160 

240 

Column total 

too 

200 

300 


Yes, no expHrted result is 
Ims than 5 if the samples 
are r<.irt<k>m 


19 44 Remember we were interested to see 
whether the inoculation protected 
against the disease. *<Ve expect, if this 
is so, to observe fewer inoculated 
affected people than expected Arc 
there? 

Is this a one or a two tailed test? 


Yes. 

20 were expected hut only 
5 were observed in this 
group. 


A one tailed test 


19 45 What is the formula for calculating 
X 2 ? 


X' 


V <0 - EJ* 


19.46 In Frame 19.43 

, 2 (5 - 20) 2 

X ‘ 20 


i ? 


♦ ? + ? 


- } 


155 40)' <95 80l‘ 

4CT 80 

1160 145)*’ 


1G0 


3375 

T«0 


- 21 


19 47 . When X' was used to test 

Goodness of fit' to a theory f = 


k - 1 
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19.48 In using X to test association in a 
contingency table f ~ (r 1) (e- II whete 
r is the number of rows and c is the 
number of columns in the body of 

the table. 

Hi Frame 19.43. f ~ 7 

19.49 i.e There is 1 degree of freedom. 

This is because if 1 expected result 
is calculated in a 7 rcwvnd -J? columned 
Con t •tigencv fable, us the renv and 
column totals are fixed, the rest 
of the numbers in the table cannot 
be chosen freely. 


t 5 J. Complete this factitious table 



B 

Not B 

flow Total 

A 

10 

r 

40 

Nor A 



85 

Column 

Towf 

50 

75 

125 


I.e there is oilv 1 (reecho*^ 

11 ik^K of freedom I 


19.50 Anyway, to come hack ro the 
inoculation problem. 

In frame 19.44 you decided this wn 

a -tailed lest 

In Frame 19.48 you calculated f 
to equal 

*- What is the requited significant 
X* values m the table? 


19.01 In Fojme 19 46 X* was calculated from 
tne contingency tables to equal 21 
What is your conclusion!' 

19.52 When X* >> um-'.I to test 'Goodness 
of fn to a theory |e (j genetics) 
t - ? 




(2 II (2 »|=1 



B 

Not D 

ftauv To 

4 

10 

40 

10 

40 



30 


A lot A 

50 10 

75 

30 



40 

45 


f'otut on 

Tot.it 

50 

75 

125 


One 

One. 

2 7 OC fut Ob 
5 412 foi 01 


The Null Hy(Kithrs»s is 
rejected b uculotion 
protects sign bcantly 

(.01 ^ p) 


k 1 whuie k is the number 

of classes 
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19 57 x 2 


(0 El 3 

v 

i E 


. } n the last frame? 


19 58 Assuming this is a two-tailed ted with 
f = 1 

The 'iignifftijn! value of 
from the Tables * 7 


(25 16) ** ( (15-24| 2 


16 


24 


(15 24) 2 f45 36) 1 

i - • - 

24 36 

^ 14 


3 841 for .05 
6 635 for 01 


15 59 What is your conclusion 1 


19.60 What is the value of f in a 

3 row x 8 column contingency 
table for testing far association? 


Calculated X* ts bigger 
The Mul’> Hypothesis is 
rejected- There is a iMjnifirac 
association between eye 
colour and hair colour 

1 01 >p) 


(3 l) (8 1) 14 


19 61 By completing the answer; below 
decide whether you think that 
knowledge of bilhariiu protects 
children from risking contracting 
Iht disease 

Here arc the results obtained by 
Dr V. 



fit » 

Sarf* 

OikmJ 

Atm 


*C'V»vViyr 


Ali*Jir<ai4r 

7,-*w 

l*»Ar i » 

Ml 

IQ 

*> 

HD 

Ah# 

40 

Oil 

76 

IJO 

Ciitu*T 

Tf*& 

m 

• M 

m 

JO) 


State thn Will' Hypothesis and 
i tentative. calcu ;nrtg thr -pec ted 
table using the formula: 

E 


The Null Hypothesis is that 
any association is only due 
to chance. 

The alternative is that them 
is an association between 
knowledge and risk . 


row total x colum total 

F-——-- » 

grand total 


(coi>td on oppo&te oat}#) 
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19 63 Wile!is th.* formula used for t it 
testing for assoeiahOrt? 

1964 'Afh.it sthe valued*f if X* li 
used »or test.nc GdOdnesa of 
fit' to theory 1 

19 66 'A'h.it lire ihe critwu ' employ ug 
u test • 


19 66 If some values of E a-e much I. • . 

than 5 and the contitHjmcv table *4 
lari|c how ran you sometimns 
ovrtrome this obstacle? 


19.67 What i4 tlw formula for \ : ’ 


19 68 Practical Example 

Mr L. P wants to k now whether 
malignancy is associated wi»h the 
situ of uerebrol tumik/r. 

His results were 

Hi.’ n.jft 


V tOM.ll I'tflt 


to 

40 

Temporal to*i« 

n 

2 

» 

Oitiar 

51 

29 

80 


too 

50 

too 


ttlhal conclusion would he dtjw^ 



Thi yjmple. art* fondom 
The rl.iia -squalitative 
Mu E iv less than 5 


By pooling vomit classes 
as we did it* Fraem 19 


\ - 


(O El’ 



SUMMARY 

V* chi squa/eii i. the distribution used for testing data where. 

1 The samples die riinrlnm 

2 The data is qualitative. 

3 T here ia ideally ih> e»peered value less than S 

Calculated ' 1 . where 0 is observed .ind E an expected result 

kontri on oppovte ihigei 
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SUMMARY IcontcJ) 

For two tailed iwtt the .05 and 01 column* »n the X : tables arc used 
directly For one-tailed tests columns 10 and .02 are used for significance 
levds of .05 and .01 If it is found that the calculated value of X* is less 
than the 95 and 99 values it suggests the possibility of cheating. 

♦ = k — 1 where k is the number of classes if testing Good ness of fit to a theory 

Where X"' is used to test for associations in a contingency tahle, the 
expected results twe calculated using 

row total x column total 

E = ------ — • 

grand total 


Here f |r 11 (c I) where t is the number of rows and c is the number 
of columns m the body of the contingency table. 

The Null Hypothesis states that the differences between the observed and 
expected results are only due to chance variation. If the calculated value 
of X 2 is greater than the significant value, the Null Hypothesis is rejected 


Note 

You have done well to complete this programme particularly if you have 
not sneaked a look at the answers before attempting to solve the frames 
yourself Th<s is the book concluded Primarily you should be able to 
understand what most statistical jargon <s about. I hope you can also 
perform simple tests for yourself. One of the aims is to have shown you what 
you cannot yet cope with. The keener people ought to be able to understand 
other statistical hooks by now However, one of the best lessons to remember 
when doing research is that if in doubt as to how to analyse your data, and 
statistics is involved, ask advice before collecting the data 
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23 Country of Origin of Doctors Practising m Country X »n 1967. OaUt 
Collected by Professor W. F. Ross, 1967. 
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5.29 



X 

(X-X) 

r» 

IX 

X 

X 

IX l > 

1 

2 

-3 

9 

2 

4 

2 

4 

-1 

1 

4 

16 

3 

5 

0 

0 

5 

25 

4 

3 

-2 

4 

3 

9 

5 

6 

41 

1 

6 

36 

6 

9 

+4 

16 

9 

81 

7 

9 

+4 

16 

9 

81 

8 

1 

-4 

16 

1 

1 

9 

0 

5 

25 

0 

0 

10 

11 

»8 

36 

11 

121 


SIX) - 50 

X = 5 

u*\ o : 

L'fX-X) 2 - 124 

S<X)- 50 

I|X 2 > = 374 


-I<XI 2 - 


lIXI 2 

N 


374 


S0 ; 

10 


HX-X) J 

N-1 


124 

9 


«s 


N-1 

374 250 


9 


124 

9 


«1 
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Frame 3.1 

Frame 3.3 

X 

X-X 1X-X1 2 

X 

X 2 

103 

=3T 

961 


52 

2704 

114 

-20 

400 


79 

6241 

114 

20 

400 


80 

6400 

122 

-12 

144 


100 

10000 

131 

-3 

9 


103 

10609 

138 

+ 4 

16 


104 

10816 

138 

♦4 

16 


104 

10816 

138 

♦4 

16 


106 

11236 

143 

+9 

81 


109 

11881 

146 

•12 

144 


111 

12321 

151 

♦ 17 

289 


120 

14400 

170 

♦36 

1296 


121 . 

14641 





127 

16129 





149 

22201 





150 

22500 


£(X-X1 - 

0 


162 

26244 

SX - 1608 

£<X-X} 2 

- 3772 

,Sx 

- 1777 

EX 2 = 209139 

N * 12 

s 2 - 3772 
11 

- 342 9 

N 

- 16 

s> 

5 

ii 

IX 


x 

* 111.0625 

16 

.'4* 185 



15 






- 785.4 






s - 28.0 
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9 44 (II Scattor D-agram 


12 

• 

11 


10 

• 

9 


8 

• 

7 

_ • 

6 

_ • 

Y 


5 

• • 

4 

• 

3 

- • 

2 

• 

1 


n 

‘ » i i i * i - -i_L 


Q ___l_I_I_I_I_L . 1 -U 

0123456769 10 


X 


(21 Evtnrate of t^e Correlation Coefficient 
could be * 0.3 


(3) Calculation of r. 


X 

Y 

X J 

Y 2 

XY 

1 

3 

1 

9 

3 

2 

5 

4 

25 

10 

3 

5 

9 

25 

15 

6 

2 

25 

4 

10 

& 

4 

25 

16 

20 

5 

6 

25 

36 

30 

1 

7 

49 

49 

49 

7 

10 

49 

100 

70 

9 

8 

81 

64 

72 

(0 

12 

100 

144 

120 

IlX> - &4 

SlY) - 62 

£|X 2 | - 368 

I(Y 3 1 - 472 

llXY> -399 
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(contd ) 


r .= 


..r • 


..r 


..r 


N - 10 


£<XY) - 


<£Xl <IY| 
N 




339 - 


54x62 

10 




to 

399 334 8 


1360-291 61 <472 384 4| 


64? 


64.2 


64.2 


v /<76.4! 187.61 v '6692 64 81 8 

♦0,78 
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<4) Calculation ot p 


X 

Y 

Ranked X 

Ranked Y 

0 

D 2 

1 

3 

10 

9 

♦ 1 

1 

2 

5 

9 

6H 

♦2* 

6!4 

3 

5 

8 

6% 

♦ 1% 

2’4 

5 

2 

6 

10 

-4 

16 

5 

4 

6 

8 

-2 

4 

5 

6 

6 

5 

♦ 1 

1 

7 

7 

3Yi 

4 

- 14 

14 

7 

10 

3Vi 

2 

♦114 

214 

9 

8 

2 

3 

-1 

1 

10 

12 

1 

1 

0 

0 



Sum to 55 

Sum to 55 

SlDI - 0 

£(0*1 - 34 
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9 44 


11 6 ? 


IcotitU. I 
N - 10 
ft - 1 




ft 


1 


1 


6l0‘ 

mn- n 

6 x 34 


10 x 99 

34 

irt6 


(J 

r 


0 79 

♦0.78 and f) - 0 79 


which are approximately the same. 

Of course r is the more accurate estimate as it uses ad the information. 


You ought to have thought of most of the following points. 

11 Definition of the population 

I ' 1 : t:jr it us‘ i lily .. tit exhaustively ileftne his pop l it on so that tho' 

r -• 'id 11i-1 1 .oicl an* obvious • or example is he only running 

His trials on adults, or females? How is he going to define overweight 7 Is l 
going to attempt to exclude people with renal or hormonal disease ana rf 
so, how? 

21 Factors affecting precision 

-jo i •( to weigh patient- dressed >nly in ,i go.vn provided -it the ti n-.t 
VVfut decisions is he going to make about diet? Over whar period of time ( 
will h L - measure the decrease in weight 7 How many patients will he iikIui 

3) Factors affecting bias 

The trial will fortunately he prospective and obieciive. Random samples 
m st be allotted If the patients are allotted numbers consecutively as 
they mtei the trijl the numbers can previously have been allotted to the 
different treatments using tables of random numbers. One group will be 

the control group on a placebo 

4) Other Factors 

bheck mat the result will be arialysnble before starting (you will learn 
mm lest vrhich is suitable soon) Decide before starting what will be 
done with patents who drop out of the trials. Decide what records of 
drug side-effects must t>* kept and what will be done with them. 
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This is a one-tailed test as the surgeon's wife was only interested in the 
physician's prowess at tossing tails. 

The answer depends on the significance level 


For, the probability of tossing 1 tad is ^ 

- 5 

1 i 

For, the probability of tossing 2 tails is (sp) 

= .25 

13 

For. the probability of tossing 3 tads ts (g f 

- 125 

1 4 

For, the probability of toss«ng 4 tails <% (jl 

* .0625 

1 s 

For, the probability of tossing 6 tads is > 

= .03125 


.‘•Considering the .05 significance level p is less than 05 at 5 throws and 
she should have persuaded her husband to stop then 
However, for the significance level ol 01. p only becomes loss than this 
value at the seventh throw. 


46 The Null Hypothesis states that there is no significant difference. The 
alternative is that Bu increases weight gam 


Calculated z 


= Xi - X 2 

/TO 

V N, N; 
5.5 - 5.2 


A? 

V 60 

03 
/ 3 65 

V 60 


I 


1.4* 

50" 


- 1 15 


This is a 1 tailed test. 

The equivalent r values are 1 G |0.5) and 2 3 lO.1l. 

Calculated t is levs than these significant values of z and 
p is qreater than the significance levels 

Ip >061 

Therefore either B |; does not increase the weight gain or theie is 
insufficient evidence. 
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17 62 The Null Hypothesis states that there is no significance difference. 
It is a one-tailed test 

X, for Diabetic Mothers is 134 

N, -12 

X, lor Non Diabetic Mothers is 111 0625 
N 2 * 16 

f Nl t -*Nj - 2- 26 


1 



-x 2 


1 


N ? 



) 

where s 


v , -(X,) 2 _ . (IX 2 > 

ilX.l -— ♦ IlX, 2 )- 

_N,_ N 3 

N| • M 2 2 

(3772) + (209139 1973581 


15553 

26 


26 


= 598 2 


s = 24 b 

NotellX X,l 2 L(X 2 ) - 


134 - 111.0625 



» 2.45 

Calculated t * 2 45 

I aholated f w »h f - 26 and using one tail is 1 706 lor a significance level 
of 05 inrf 7 479 to# the 01 significance level. 

05>p>0l 


Ttie i inclusion n that the birth weights of children of diabetic mothers 
iiit- signif icantly bigger than the control gioup et the ,05 significance 
level but not at the 01 
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7 63 Question 1 

0 unknown, N < 30, unknown 


Use t 


*■-*« = ax - x,i» + ax - x,i ] 

N, +N 2 2 



X, - 14 8; Xj * 13 7; s 2 - .81; N,-?5. Nj - 20 

.•.McUMdt- —- - ' 3 - 7 • U * 4 

« n —r 27 

» _ ♦ — 

v/ ?5 20 

f = N| +Nj 2 = 43 which is not shown in the r tables in the pull-out. 
(If f - 29 or bigger the value of t does not change much and we use the 
bottom line). 

This is a two railed test 
Significant t = 2.000 (.051 or 2.600 1.0 U 
Calculated t > Significant t 

There is a significant difference between the results I 01 "> p| 


Question 2 

a unknown, N < 30, Paired results (matched) 
X — m on the paired results 


Use t 


s 


X * 180 s * 450 N = 25 ft * 0 
180-0 

.. t * - - 2 

4b0 

y/W 

This « a one tailed test with f - 24 

Signtficam t - 1.711 (.051 and 2 492 (.011 

These results can be summarised .05 > p > 01 

People with cancer of the stomach ate significantly more at 05 but not 

at .01 significance level 
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Question 3 


n unknown N > 30. p unknown 


.., • *i^j 



Calculated 7 - 


2.74 s } , - 39 

100 

2 42 - 2 24 



Sj J - .25 


JI8 

08 


225 




This is a one-tui>ed test 

Significant i 1 6 < 051 or 2 3 | 01) 
i.n 05 > p > 01 

At ihr 05 level you accept that the rate of accidents was significantly 
affected by the alcohol level hut at the 01 level you conclude that either 
there is no effect or insufficient evidence. 


18 20 The Null Hypothesis is that there is no real correlation. It is a two-tailed 
test. 

Tati;;l.iled t <f - 8) is 2 306 | 05» or 3.355 ( Oil 

. .78x28 „ 2.18 

N / (1 - 6084) 62 ® 

35 


Calculated I - 

y/{\ 78^7 


CalcuUied t is b.gcwr than significant t at both levels (.01 > pi 
The conclusion is that correlation is real 

These results may well be summarised m a medical journal.- 
f '0 78 This is rvidence of rcvil correlation (t 3.5 .01>P> 
(0 79 may be substituted foi p to notch the same conclusion), 
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HOW MUCH HAVE YOU LEARNT? 


This is to test your Knowledge the answers .««* given at the end of the test 
Quartans Answer* 

1, aj Give an example of quantitative <or 
continuous) data. 

U) Give an example of qualitative (or 
discrete! data 

cl Give two reasons why thn distinction 
it important in statistics. 

2 Of 200 births in the Lady Chattedy s 

Maternity Home east year. 90 were female 

a) What was the ratio of males to females’ 

b) What was the proportion of females? 

c) What was the percentage of females? 

3. Is 

the number of sailors killed at Trafalgar 
the number of sailors involved at Trafalqar 

a ratio, a rate, a proportion or a percentage? 

4 Make a rough sketch (a) Histogram (bl Pie diagram 

of a| a histogram 
and b) a pie diagram 


b When would you use a frequency polygon 
to present data? 

6 This is » distribution of data. 
a| What is it called? 

b) Mark »n the position of the mean. 

c) What is the length of Afl called? 

(B is a point of inflection) 
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7 What is the difference lietween 
£<X 2 ) and (IX) 2 ? 

8 Calculate the mean, median and mode 
of the following distribution - 
1.2.2, 2. 3. 5. 5. 6, 6. 18 

9 Why is the mean a bettei measure of the 
middle than either the median or the 
mode’ 

Why is the sum of the deviations from 
the mean not used as a measure of 
variation? 

1 X means "the sum of" 

X is an observed result. 

X is the mean. 

* is the standard deviation 
N is the number of results 

What is wrong with the following 
equation f 

s = I<X - X) J 

n i 


The variance of 1, 2, 3, 3, 4, 5 is 2 
What value has the standard deviation? 
What value has the range? 

Which n the uetter measure ot variation, 
the range or the standied deviation? 
Why? 

Why is this not a good diagram? 



Percentage P*x> rate in Anatomy ai rhix 
Medical school (I960 64 inclusive! 
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What do vou understand by correlation 7 


What are the maximum and minimum 
numerical values of a correlation 
coefficient? What iy the value when 
there is no correlation? 

The diuQiam below shows corresponding 
values for X and Y. 

'A'hat is the value of the correlation 
coefficient here? 



X 


What is the main difference between 
Pearson's Correlation Coefficient and 
Spearman's? 

p s Spts»rman’s Correlation Coefficient 

P " * NIN 3 - 1) 

N is the number o* pairs of results. 

What does 0 represent? 

s Is a statistic and tr is a parameter and 
both represent the standard deviation 
What is the difference n the meaning of 
these ? symbols? 

What do you understand by bias" 7 
What method would you use to reduce 
b*as? 

Precision ran he used to describe how 
dose vat ious estimates of a population 
mean are to each other. How would you 
irtipiove the precision of a sample? 
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You are interested in the I.Q'i of This 
year & 1st Year Medical Students at 
your University as opposed to students 
m other faculties Define your control 
group exactly. 

?4 You a»e told that the pi usability equals 
one. 

What does this mean? 



?5 at What it the probability of throwing 
a 2 or a 6 with a “dice' 7 
b) What is the probability of throwing 
a 2 and a 6 with two dice? 

?6 What type of sampling distribution 
is this? 



V There are many possible normal distribu 
tium These can be standardised to a 
single normal distribution called the 
standard normal distribution (shown 
below I, lal>»H the arrows, 
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29 a) What is the special name for the 
standard delation of the sampling 
distribution of the mean? bl How 

is il related to the population standard 
deviation? 

30 Whot n the use of a significance 
(.it confidence! level? 

31 What do you understand by the meanirig 
of the lerin Null Hypothesis ’ 

32 An article in a journal states 

' p>.05' \ What conclusion woulo you 
draw about the Null Hypothesis? 

33 Whnt makes you decide to use a one 
as opposed to a two tailed test? 

34 Give the d I for once between the uses 
of /, the standard normal deviate, and 
Student \ t <t t«n> with respect to 

ai the samptc sice 
b) s* and u' 

cl the number of degree*, of 
freedum, f? 

35 If you know the formula for calculating 
s' how can you use it to calculate the 
number of degrees ol freedom? 
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ANSWERS 


1 a) Haemoglobin level Bladder volume 

b) Sex. 8 »ood groups. 

e> The different typos of data are presented differently and are subjected to 
different tests 


2 a) 


110 11 

tut ° f ~sr 

90 9 


2 HI 


90 

200 


or 


9 

20 


2 c) 45% 


3 Proportion 



When the data is quantitative etoecally when two sets of data are to be 
illustrated on the tame diagram, 


6 a) 
bl 


The normal distribution. 



The Mean 


c) The standard dev iattn t. 

i.|X •) is th« vjm of the numbers already squared 
(£x|* is the square of the numbers already summed. 

8 The mean - 5 

The median • 4 

The mode — 2 

It uses all the information land can be used further m significance tests.I 
6 It always equals zero 
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12 

13 

14 

15 

16 

17 


? - 2(X -X)* or 

N - 1 



IX -XI 
N - 1 


2 


The standard deviation = \/T 

The range = 4 


The standard deviation because it uses all the information (and can be used 
further in significance tests.! 


The zero is suppressed and the line is extrapolated. 


Association. 

The maximum value +1 
The minimum value • -1 
When there is r> o correlation the value - 0 


-1 * 


18 Pearson's uses the actual results and Spearman’s the ranks 


19 0 is the difference between ranks. 

70 s is the standard dev-ation in a sample, o is the standard deviation in the 
population 


?! Bias is the off target effect of statistics 

Randomisation, ■’blind'' sampling are examples. 

22 Increase its sue 

?3 A random sample of IQ's of this vears 1st year students in other faculties 
at youi university 


?4 It is inev. table. 


25 


a! 


1 

3 



or 


| 

18 


26 The distribution of the difference between two sample means. 

27 a) The mean = 0 bl The standard deviation - 1 



29 The standard error. 

It equals the population standard deviation 
divided by y/fg" 
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30 It enables decisions to be made 

31 Any apparent d fference Is due omy to chance war iation. 

32 It is either true o<- there is as yet insufficient evidence 
to r«|<jCt it. 

33 The d-ternative hypothesis is being only concerned with one outcome. 

34 t is used 

ji wth small samples (ny N less than 30) 

!>) with and 

cl depends on the number of decrees of freedom, f 
? is used 

a) wdh large samples Isay N more than 30) and with smaller samples only if 
rr is known 

b) It can be used with s’ or o 2 in large samples and 

c) It does not depend on the number of degree* of freedom. 

35 It equals the denominator. 

36 3.182. 

37 The samples are random. 

The data is qualitative 

There is ideally no expected value less than 6. 



A 

Not A 


B 

12 

18 

30 

Not B 

28 

42 

70 


40 

60 

100 


b) f - 1 

39 el Reject the Null Hypothesis, Accept the Alternative 

b) Either the Null Hypothesis n true or there is insufficient evidence to reject il 

40 Suspect cheat inc. 

>tatisticsmsma OOoast 


siatiSi3CS:nsnaOC'.asl 



Pr MiyJ by Pr»tnHt<ourMr» bv T ft A ContOM* Lid tdmtv« B h 
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